Simplified example:
Code: Select all
#NoEnv
Var := "
(Join
2`r`n
3`n
4`n
5`n
6`n
7`n
8`n
9`n
1
)"
MsgBox, 0, Before Sort, %Var%
Sort, Var
MsgBox, 0, After Sort, %Var%`n`nWhere is the 9?
Code: Select all
#NoEnv
Var := "
(Join
2`r`n
3`n
4`n
5`n
6`n
7`n
8`n
9`n
1
)"
MsgBox, 0, Before Sort, %Var%
Sort, Var
MsgBox, 0, After Sort, %Var%`n`nWhere is the 9?
It's still there, it's just hidden from the display because of the extra `r character in the data. Don't believe me? Check the before and after length of the string.just me wrote:Where is the 9?
Code: Select all
// ResultType Line::PerformSort()
// Terminate the variable's contents.
if (trailing_crlf_added_temporarily) // Remove the CRLF only after its presence was used above to simplify the code by reducing the number of types/cases.
{
if (dest[-2] == '\r') // ends with CRLF
{
dest[-2] = '\0';
output_var.ByteLength() -= 2 * sizeof(TCHAR);
}
else // ends with LF
{
dest[-1] = '\0';
output_var.ByteLength() -= sizeof(TCHAR);
}
}
else
...
That's how I understand Chris' comments, too. And I think that all other options (besides F) should work correctly. But surprisinglytrailing_crlf_added_temporarily = true; is set regardless of the U option. This should be changed.lexikos wrote:... otherwise the "unique" option will not work correctly.
Code: Select all
if (omit_dupes && item_prev)
{
// Update to the comment below: Exact dupes will still be removed when sort_by_naked_filename
// or g_SortColumnOffset is in effect because duplicate lines would still be adjacent to
// each other even in these modes. There doesn't appear to be any exceptions, even if
// some items in the list are sorted as blanks due to being shorter than the specified
// g_SortColumnOffset.
// As documented, special dupe-checking modes are not offered when sort_by_naked_filename
// is in effect, or g_SortColumnOffset is greater than 1. That's because the need for such
// a thing seems too rare (and the result too strange) to justify the extra code size.
// However, adjacent dupes are still removed when any of the above modes are in effect,
// or when the "random" mode is in effect. This might have some usefulness; for example,
// if a list of songs is sorted in random order, but it contains "favorite" songs listed twice,
// the dupe-removal feature would remove duplicate songs if they happen to be sorted
// to lie adjacent to each other, which would be useful to prevent the same song from
// playing twice in a row.
if (g_SortNumeric && !g_SortColumnOffset)
// if g_SortColumnOffset is zero, fall back to the normal dupe checking in case its
// ever useful to anyone. This is done because numbers in an offset column are not supported
// since the extra code size doensn't seem justified given the rarity of the need.
keep_this_item = (ATOF(*item_curr) != ATOF(item_prev)); // ATOF() ignores any trailing \r in CRLF mode, so no extra logic is needed for that.
else
keep_this_item = tcscmp2(*item_curr, item_prev, g_SortCaseSensitive); // v1.0.43.03: Added support for locale-insensitive mode.
// Permutations of sorting case sensitive vs. eliminating duplicates based on case sensitivity:
// 1) Sort is not case sens, but dupes are: Won't work because sort didn't necessarily put
// same-case dupes adjacent to each other.
// 2) Converse: probably not reliable because there could be unrelated items in between
// two strings that are duplicates but weren't sorted adjacently due to their case.
// 3) Both are case sensitive: seems okay
// 4) Both are not case sensitive: seems okay
//
// In light of the above, using the g_SortCaseSensitive flag to control the behavior of
// both sorting and dupe-removal seems best.
}
It affects the sort order, in the same way that using `r`n vs `n affects the sort order; shown below. At the moment, if you always use `r`n or always use `n, the results should be consistent (between the last item and other items). If trailing_crlf_added_temporarily depended on the U option, sort order would also depend on the U option.just me wrote:trailing_crlf_added_temporarily = true; is set regardless of the U option. This should be changed.
Code: Select all
x =
(Join
a`a2`r`n
a
)
y =
(Join
a`a2`n
a
)
Sort x
Sort y
MsgBox % x "`n---`n" y
I doubt that it would be "cleaner", mainly because I think it would be more convoluted, but I suppose it would produce better results.just me wrote:It will cost performance for the Unique sort, but would be much cleaner than the current solution.
I believe the bugs trailing_crlf_added_temporarily was intended to fix existed because `r wasn't handled at all. You can probably produce the same results artificially by using some other delimiter preceded by `r. You can also see the effect by using some other visible character in place of `r.maybe it had been here before v1.0.47.05?
Interesting, but personally I'd prefer to get the sort order of y in both cases. I doubt that there are 'real life' cases where the CR of a CRLF combination is intended to be a part of the items to be sorted. Also, we already get a different sort order depending on wether x has been read from a file by FileRead or from an edit control, because AHK 'kindly' removes the CR in the latter case.lexikos wrote:It affects the sort order, in the same way that using `r`n vs `n affects the sort order; shown below.
Code: Select all
#NoEnv
X := "a`ta`r`na"
Gui, Add, Edit, vED, %X%
GuiControlGet, Y, , ED
Sort, X
Sort, Y
FileDelete, Tmp.txt
FileAppend, %Y%, Tmp.Txt
FileRead, Z, Tmp.txt
Sort, Z
MsgBox % X "`n---`n" Y "`n---`n" Z
ExitApp
Dx: Specifies x as the delimiter character, which determines where each item in VarName begins and ends. If this option is not present, x defaults to linefeed (`n), which correctly sorts VarName if all of its lines either end in LF (`n) or CR+LF (`r`n). In the latter case CR (`r) is treated as part of the line to be sorted.
However, the current solution also ensures that the new final item (after sorting) doesn't end up with a trailing `r, and that the former final item (before sorting) ends up delimited by `r`n, not just `n.
The sort order depends on the contents, not on the options. If the U option affected trailing_crlf_added_temporarily, the sort order would depend on both the contents and the options. Anyway, it has to be applied regardless of U for the workaround-behaviour I noted above, which is also described in the comments.just me wrote:Also, we already get a different sort order depending on wether x has been read from a file by FileRead or from an edit control
Putting aside performance or simplicity, that would be ideal; it would mean treating `r as part of the delimiter, not in any way part of the item.Interesting, but personally I'd prefer to get the sort order of y in both cases.
I think it's entirely treated "as part of the line" and also halfways "as part of the delimiter". And it shouldn't be treated "as part of the delimiter" at all, because oflexikos wrote:The treatment of `r is halfway between "as part of the line" and "as part of the delimiter"
And only the delimiter should be added to the last line.If this option is not present, x defaults to linefeed (`n) ...
I agree, and I don't know how it could be done correctly without a perceptible loss of performance.Putting aside performance or simplicity, that would be ideal;
Well, the new option could be L (line sort). It would strip all CRLF as well as sole LF from the lines before sorting and add back the line delimiter of the first line after the sort. This shouldn't have severe affects on existing sorts. Also, it would solve the problem of unrecognized dupes for the last line.Chris wrote: // OLD/OBSOLETE comment from a section that was removed because it's now handled by this section:
// Check if special handling is needed due to the following situation:
// Delimiter is LF but the contents are lines delimited by CRLF, not just LF
// and the original/unsorted list's last item was not terminated by an
// "allowed delimiter". The symptoms of this are that after the sort, the
// last item will end in \r when it should end in no delimiter at all.
// This happens pretty often, such as when the clipboard contains files.
// In the future, an option letter can be added to turn off this workaround,
// but it seems unlikely that anyone would ever want to.
Users browsing this forum: No registered users and 48 guests