[v2.0.2] File path to file ID and back again

Post your working scripts, libraries and tools.
Saiapatsu
Posts: 17
Joined: 11 Jul 2019, 15:02

[v2.0.2] File path to file ID and back again

09 Feb 2023, 11:48

Code: Select all

; https://www.codeproject.com/questions/273746/given-an-ntfs-file-id-is-there-any-official-way-to

; Get 64-bit NTFS file index (or whatever it's supposed to be called) in two halves.
; Optionally specify a VarRef to get the file handle. You are responsible for closing it.
; DllCall("CloseHandle", "Ptr", handle) || MsgBox("CloseHandle 1 failed " A_LastError)
; Returns nonzero on success or 0 on failure.
FilePathToIndex(path, &high, &low, outhandle := 0)
{
	static info := Buffer(52)
	
	handle := DllCall("CreateFile"
		, "Str" , path
		, "UInt", 0 ; dwDesiredAccess: neither read nor write
		, "UInt", 7 ; dwShareMode: all of them???
		, "Ptr" , 0 ; lpSecurityAttributes: NULL
		, "UInt", 3 ; dwCreationDisposition: OPEN_EXISTING
		, "UInt", 0 ; dwFlagsAndAttributes: ???
		, "Ptr" , 0) ; hTemplateFile: NULL
	
	if handle == -1
		return 0
	
	; https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information
	if !DllCall("GetFileInformationByHandle"
		, "Ptr", handle ; hFile
		, "Ptr", info) ; lpFileInformation
		return (DllCall("CloseHandle", "Ptr", handle), 0)
	; MsgBox "GetFileInformationByHandle failed " A_LastError
	
	high := NumGet(info, 44, "UInt") ; nFileIndexHigh
	low := NumGet(info, 48, "UInt") ; nFileIndexLow
	; MsgBox NumGet(info, 28, "UInt") ; dwVolumeSerialNumber
	
	if outhandle is VarRef
		%outhandle% := handle
	else
		DllCall("CloseHandle", "Ptr", handle)
	
	return 1
}

; Returns string on success or 0 on failure.
; volumelabel is a string whose first character is the volume label to prepend to the path.
FileIdToPath(hint, volumelabel, high, low, outhandle := 0)
{
	; buffer is a little too small if it's supposed to be receiving long paths T.B.H.
	static fileid := (() => (NumPut("UInt", 24, "UInt", 0, buf := Buffer(24)), buf))(), nameinfo := Buffer(1024)
	
	NumPut("UInt", low, "UInt", high, fileid, 8)
	
	; The endianness of the file ID and whether hVolumeHint is necessary were figured out by trial and error.
	; https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-openfilebyid
	; This unfortunate bastard requires an open file handle on the same volume.
	; It also allows opening a file with just the 64-bit file ID???
	handle := DllCall("OpenFileById"
		, "Ptr", hint ; hVolumeHint
		, "Ptr", fileid ; lpFileId
		, "UInt", 0 ; dwDesiredAccess
		, "UInt", 7 ; dwShareMode
		, "Ptr", 0 ; lpSecurityAttributes
		, "UInt", 0) ; dwFlagsAndAttributes
	
	if handle == -1
		return 0
	
	; https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getfileinformationbyhandleex
	if !DllCall("GetFileInformationByHandleEx"
		, "Ptr", handle ; hFile
		, "UInt", 2 ; FileInformationClass: FILE_NAME_INFO
		, "Ptr", nameinfo ; lpFileInformation
		, "UInt", nameinfo.Size) ; dwBufferSize
		return (DllCall("CloseHandle", "Ptr", handle), 0)
	
	if outhandle is VarRef
		%outhandle% := handle
	else
		DllCall("CloseHandle", "Ptr", handle)
	
	length := NumGet(nameinfo, 0, "UInt")
	; Replace the first two wchars of this buffer with valid text and
	; interpret the whole buffer as a string
	; 0x003a is Ord ":"
	NumPut("UShort", Ord(volumelabel), "UShort", 0x003a, nameinfo)
	; length >> 1 turns length in bytes into length in wchars
	return StrGet(nameinfo, (length >> 1) + 2, "UTF-16")
}
I wrote this to manage my screenshot collection. I have nearly a hundred thousand of them and sometimes I'd like to find a screenshot which was taken nearest to a specific date-time. In order to accomplish that, I had to get the file IDs of all of my screenshots, cram them into a little index file alongside a date, search for that date in the index and get the path of the corresponding screenshot to show it.

Minimal example that shows a message box with one of the full paths of "test.ahk".

Code: Select all

#include <FileId>
FilePathToIndex("test.ahk", &high, &low, &handle)
MsgBox FileIdToPath(handle, "C", high, low)
DllCall("CloseHandle", "Ptr", handle) || MsgBox("CloseHandle failed " A_LastError)
Code that built an unsorted index from all of my existing screenshots.

Code: Select all

#include <FileId>

; Goes through all files in the directory A_Args[1] and dumps their time-since-epoch and file ID into a file.

epoch := "20130101000000"

outfile := FileOpen("logid-unsorted", 0x1)

Loop Files A_Args[1] "\*", "FR"
{
	if A_LoopFileName ~= "^20\d\d\d\d\d\d\d\d\d\d\d\d"
	{
		outfile.WriteUInt(DateDiff(SubStr(A_LoopFileName, 1, 14), epoch, "Seconds"))
		FilePathToIndex("\\?\" A_LoopFileFullPath, &high, &low) || (MsgBox("\\?\" A_LoopFileFullPath), ExitApp())
		outfile.WriteUInt(high)
		outfile.WriteUInt(low)
	}
}
Lua (Luvit) script that sorts the output of the above script.

Code: Select all

-- Sorts a time-high-low triplets file generated by time2-index-collect.ahk

local fs = require "fs"
local infile = fs.readFileSync "logid-unsorted"
local list = {}
for qwe in infile:gmatch("............") do
	-- get little-endian UInt from the start
	table.insert(list, {string.unpack("<L", qwe), qwe})
end
table.sort(list, function(a, b) return a[1] < b[1] end)
for i,v in ipairs(list) do list[i] = v[2] end
fs.writeFileSync("logid", table.concat(list))
Code to query a date from a file full of sorted <time, high, low> triplets.

Code: Select all

#include <FileId>

idlogepoch := "20130101000000"
scrdir := A_MyDocuments "\Screenshots" ; root directory
idlogpath := scrdir "\logid"

MsgBox GetPathFromDate("20230201010017")

GetPathFromDate(timestamp)
{
	try
		t := DateDiff(timestamp, idlogepoch, "Seconds")
	catch
		return 0
	logid := FileOpen(idlogpath, 0)
	, a := 0
	, b := logid.Length // 12
	, c := (a + b) >> 1
	Loop
	{
		logid.Seek(c * 12)
		t2 := logid.ReadInt()
		; MsgBox t " " t2 "`n" a " " c " " b "`n" (name := FileIdToPath(logid.Handle, "C", logid.ReadUInt(), logid.ReadUInt()))
		if t2 == t || (b - a) < 3
			break
		if t2 < t
			a := c
		else
			b := c
		c := (a + b) >> 1
	}
	name := FileIdToPath(logid.Handle, "C", logid.ReadUInt(), logid.ReadUInt())
	
	; Ignore deleted or trashed files
	while (!name || name ~= "^.:\\\$Recycle.Bin\\") && c > 0
		logid.Seek(--c * 12 + 4), name := FileIdToPath(logid.Handle, "C", logid.ReadUInt(), logid.ReadUInt())
	
	logid.Close()
	
	return name
}
It's also available on my GitHub.
I might not remember to update this thread when I change something. The code in this post is the 6th revision.

This is me just finishing something and then dumping it onto the forum just in case it's useful to anyone else.
User avatar
cyruz
Posts: 348
Joined: 30 Sep 2013, 13:31

Re: [v2.0.2] File path to file ID and back again

09 Feb 2023, 17:19

Hi, I was just wondering, in your specific use case, if you have any reason to not use the filepath itself that is also an unique identifier for the file.
ABCza on the old forum.
My GitHub.
Saiapatsu
Posts: 17
Joined: 11 Jul 2019, 15:02

Re: [v2.0.2] File path to file ID and back again

11 Feb 2023, 13:55

cyruz wrote:
09 Feb 2023, 17:19
Hi, I was just wondering, in your specific use case, if you have any reason to not use the filepath itself that is also an unique identifier for the file.
Yes. I rename the files.
Each file's path is %screenshots%\<year><month> <category>\<14-digit date> <description>.
Ideally I'd write the description into a database, not the filename. However, the only image viewer that's nearly as good for viewing and operating on images as IrfanView and flexible/scriptable enough to show me the entry from the database if in a certain directory is feh, which is for Linux. MinGW's package manager doesn't have it and it's not scriptable/flexible enough in the right ways anyway.
I would love to create my own image viewer that's better than all the others and can do this, but I'm not doing that today or tomorrow.

If I were to store the file path instead, then I would have to reindex everything regularly or update the index with the new file path whenever I rename anything.
IrfanView and Explorer do not allow me to run some external script/hook to update the index whenever I rename anything.

In addition, file IDs are fixed-length, hence the fast binary search in that last code snippet.

The few benefits of storing names instead of IDs:
- The file ID can be severed if I e.g. copy a file and delete the original, but this almost never happens (other than with gremlins like Git). This kind of doesn't happen with file names.
- With file names, I can put parts of my collection in another partition or a network file or something.
What I'm saying is that hardlinks are good and stable enough for me.
User avatar
cyruz
Posts: 348
Joined: 30 Sep 2013, 13:31

Re: [v2.0.2] File path to file ID and back again

12 Feb 2023, 01:34

Interesting, it makes sense. Thanks for describing your usage pattern; I’m kinda obsessed by organizing files and notes and it’s nice to know about how others deal with this very same obsession.
ABCza on the old forum.
My GitHub.

Return to “Scripts and Functions (v2)”

Who is online

Users browsing this forum: No registered users and 33 guests