AutoHotkey Community

It is currently May 27th, 2012, 3:19 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: May 21st, 2010, 1:53 pm 
Offline

Joined: April 19th, 2007, 8:05 am
Posts: 48
I have a List contains 100k+ lines of short text with no space. it looks like:

...
asdyaier1w3
dflkj1345sdd
community
comrsddt55f
...

I also have a Dictionary file with each line an English word.

How to do I refine it, so only Dictionary words are left in the List?

Both my List and Dictionary have 100k+ entries.


Last edited by dvda2k on May 21st, 2010, 1:56 pm, edited 1 time in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 1:55 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
You could try http://www.autohotkey.net/~hugov/tf-lib ... _Substract
Quote:
Delete lines from file1 in file2 (using StringReplace)
that way you are left with a list of words that don't occur.

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 1:58 pm 
Offline

Joined: April 19th, 2007, 8:05 am
Posts: 48
hugov wrote:
You could try http://www.autohotkey.net/~hugov/tf-lib ... _Substract
Quote:
Delete lines from file1 in file2 (using StringReplace)
that way you are left with a list of words that don't occur.


Thanks you. I didn't expect the answer in 30 seconds!

the function looks like what I'm looking, great work! and I'll report the efficiency dealing with 100k+ lines.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 2:21 pm 
Offline

Joined: April 19th, 2007, 8:05 am
Posts: 48
unfortunately this function looks like exactly the opposite of what I'm trying to do. :cry:

hugov, can you make a TF_Preserve(File1, File2, PartialMatch = 0) which keeps all lines matched in File2


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 2:45 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Try this
Code:
list1= ; replace with: fileread, list1, file1.txt
(join`r`n
hello1
hello2
hello3
hello4
)

list2= ; replace with: fileread, list2, file2.txt
(join`r`n
hello3
hello7
ahello4
hello8
hello2
x
)

Loop, Parse, List1, `n, `r
   {
    If (RegExMatch(list2, "im)^\Q" . A_LoopField . "\E$" ) > 0)
       Keep .= A_LoopField "`n"
   }
MsgBox These words from file1 occur in file2:`n%Keep% ; replace with: fileappend, %keep%, foundthese.txt ; you may need to add a filedelete if you run the script on a regular basis

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 3:53 pm 
Offline

Joined: April 19th, 2007, 8:05 am
Posts: 48
thanks, i'll try this.

one more question, how do i get lines of a text file quickly? it took ReadFileLines a couple of minutes to read a 50k-line file.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: May 21st, 2010, 4:47 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
http://www.autohotkey.net/~hugov/tf-lib ... _ReadLines :wink:

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: Google Feedfetcher, rbrtryn, Yahoo [Bot] and 18 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group