Discussion:
case sensitive file test
(too old to reply)
Bob Latham
2020-05-26 11:46:39 UTC
Permalink
Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.

I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file
'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.

OS_File 17 does not appear to be case sensitive.

The only way I can see is to read the contents of the directory using
OS_GBPB 9 and wildcards and then test the characters for lower case.

I'm thinking that may be a lttle slow when doing thousands and i'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.

I've been stumped for some time as to why it fails. I ask for 1 file
at a time R3=1 and I keep going until R4=-1 . I've tried pointing R6
at all sorts, all treminated with zero byte.

folder/jpg<0>
*older/jpg<0>
*/jpg<0>

No matter which 7 failures out of ten.

I'm thinking daft things now. This search follows another looking for
another wild carded file actually */flac which works every time. If
you find what you want with OS_GBPB 9 you don't need to keep going to
R4=-1 do you? You can jump out of the loop if you find what you need
can't you? Getting desperate to understand the problem.

Thanks

Bob.
--
Bob Latham
Stourbridge, West Midlands
Bob Latham
2020-05-26 13:21:52 UTC
Permalink
Post by Bob Latham
Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.
I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file
'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.
OS_File 17 does not appear to be case sensitive.
The only way I can see is to read the contents of the directory
using OS_GBPB 9 and wildcards and then test the characters for
lower case.
I'm thinking that may be a lttle slow when doing thousands and i'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.
[Snip]

Okay, found the problem (eventually) with OS_GBPB 9 buffer size!

But if anyone has a good way to test for a lowercase file name I'd
love to hear it.



Thanks

Bob.
--
Bob Latham
Stourbridge, West Midlands
Kevin Wells
2020-05-26 14:34:59 UTC
Permalink
Post by Bob Latham
Post by Bob Latham
Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.
I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file
'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.
OS_File 17 does not appear to be case sensitive.
The only way I can see is to read the contents of the directory
using OS_GBPB 9 and wildcards and then test the characters for
lower case.
I'm thinking that may be a lttle slow when doing thousands and i'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.
[Snip]
Okay, found the problem (eventually) with OS_GBPB 9 buffer size!
But if anyone has a good way to test for a lowercase file name I'd
love to hear it.
If it is just the first letter that has to be lower case why not try for
just the first letter by the letter code e.g lower case f is CHR$(102)
while uppercase F is CHR$(70)
Post by Bob Latham
Thanks
Bob.
--
Kev Wells
http://kevsoft.co.uk/ https://ko-fi.com/kevsoft
carpe cervisium
Idiot in search of a village.
Steve Fryatt
2020-05-26 16:12:40 UTC
Permalink
On 26 May, Kevin Wells wrote in message
Post by Kevin Wells
But if anyone has a good way to test for a lowercase file name I'd love
to hear it.
If it is just the first letter that has to be lower case why not try for
just the first letter by the letter code e.g lower case f is CHR$(102)
while uppercase F is CHR$(70)
More generally, and allowing for a full set of possible characters:

DEF FNis_lower(string$)
LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%

SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%

FOR loop% = 1 TO LEN(string$)
char% = ASC(MID$(string$, loop%, 1))

byte% = char% DIV 8
bit% = char% MOD 8

alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0

IF alpha% AND (NOT lower%) THEN =FALSE
NEXT loop%

=TRUE
--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/
Bob Latham
2020-05-26 16:42:46 UTC
Permalink
Post by Steve Fryatt
DEF FNis_lower(string$)
LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%
SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%
FOR loop% = 1 TO LEN(string$)
char% = ASC(MID$(string$, loop%, 1))
byte% = char% DIV 8
bit% = char% MOD 8
alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0
IF alpha% AND (NOT lower%) THEN =FALSE
NEXT loop%
=TRUE
Wow, just wow. I didn't expect that Steve thanks very much. Now if I
can just get to understand it I'll learn something...

I presume this is a means to test a directory listing to make sure an
entry is lower case?


Here is my effort, don't laugh, it's only a hobby for me.

As you can see, when looking for one specific file name I'm unsure
what I can get away with with R6 wildcard. The nearer to actually
using no wildcards the less false returns I would get.


.folderJs1 EQUS"folder/jpg":EQUB 0:ALIGN
.folderJs4 EQUS"#older/jpg":EQUB 0:ALIGN
.incsvExt1 EQUD incsv

.JcmpFlnme STMFD R13!,{R0-R3,R14}
LDR R2,buffExtsn
ADR R3,folderJs1
.kChkfldRL LDRB R0,[R2],#1
LDRB R1,[R3],#1
CMP R0,R1
BNE JcmpFlnE ; no match
CMP R0,#0
BNE kChkfldRL
.JcmpFlnE LDMFD R13!,{R0-R3,PC}; equal=found NE= not found

.gotftype STMFD R13!,{R1-R6,R14}
MOV R6,R0
MOV R0,#9
LDR R1,incsvExt1
LDR R2,buffExtsn; ADR R2,buffer1
MOV R4,#0
STR R4,foldrFlag
STR R4,[R2]
MOV R5,#256

.gotFtp2 MOV R3,#1
SWI "OS_GBPB"; are there any *.flac files?
BCS gotFtp3
CMN R4,#1; R4 = -1
BNE gotFtp2
CMP R5,#&200; just to clear carry
BAL gotFtp7; not type looked for

.gotFtp3 MOV R0,#9
LDR R1,incsvExt1
LDR R2,buffExtsn; ADR R2,buffer1
MOV R4,#0
MOV R5,#256
ADR R6,folderJs4

.gotFtp4 MOV R3,#1
SWI "OS_GBPB"; is there a folder.jpg file
BCC gotFtp5
BL JcmpFlnme; compare against lower case folder.jpg
BNE gotFtp5

LDR R2,buffExtsn
LDRB R0,[R2]
STRB R0,foldrFlag
BAL gotFtp6

.gotFtp5 CMN R4,#1; R4 = -1
BNE gotFtp4

.gotFtp6 MOV R2,R1
CMP R2,R1; set carry

.gotFtp7 LDRB R0,foldrFlag
LDMFD R13!,{R1-R6,PC}

.foldrFlag EQUD 0
--
Bob Latham
Stourbridge, West Midlands
Steve Fryatt
2020-05-26 17:21:19 UTC
Permalink
On 26 May, Bob Latham wrote in message
Post by Bob Latham
I presume this is a means to test a directory listing to make sure an
entry is lower case?
No, it's just a generic "is this string lower case" test. The two SWIs
return pointers to tables of bit flags (so 32 bytes of 8 bits each, for all
256 characters in a RISC OS character set). In alpha_table%, a bit is set if
the character is alphabetic; in lower_table%, its set if the character is
considered lower case.

You still need OS_GBPB to find the names to test.
--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/
Bob Latham
2020-05-26 18:46:08 UTC
Permalink
Post by Steve Fryatt
On 26 May, Bob Latham wrote in message
Post by Bob Latham
I presume this is a means to test a directory listing to make
sure an entry is lower case?
No, it's just a generic "is this string lower case" test. The two
SWIs return pointers to tables of bit flags (so 32 bytes of 8 bits
each, for all 256 characters in a RISC OS character set). In
alpha_table%, a bit is set if the character is alphabetic; in
lower_table%, its set if the character is considered lower case.
You still need OS_GBPB to find the names to test.
Understood, thank you.

Cheers,

Bob.
--
Bob Latham
Stourbridge, West Midlands
Steve Drain
2020-05-27 12:14:37 UTC
Permalink
Post by Steve Fryatt
DEF FNis_lower(string$)
LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%
SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%
FOR loop% = 1 TO LEN(string$)
char% = ASC(MID$(string$, loop%, 1))
byte% = char% DIV 8
bit% = char% MOD 8
alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0
IF alpha% AND (NOT lower%) THEN =FALSE
NEXT loop%
=TRUE
Perhaps:

DEF FNis_lower(string$)
LOCAL buff%,upper%,char%
buff%=&8200:REM use input buffer or other block
$buff%=string$
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=buff% TO buff%+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

Or, if you want to disentangle it, try:

DEF FNis_lower(string$)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=&8100 TO &8100+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

;-)
j***@mdfs.net
2020-05-27 23:25:40 UTC
Permalink
DEF FNis_lower($&8100)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
char%=&8100-1
REPEAT
char%=char%+1
UNTIL ?char%=upper%??char% OR ?char%=13
=?char%=13
Steve Drain
2020-05-28 13:16:37 UTC
Permalink
Post by j***@mdfs.net
DEF FNis_lower($&8100)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
char%=&8100-1
REPEAT
char%=char%+1
UNTIL ?char%=upper%??char% OR ?char%=13
=?char%=13
There are many ways to skin this cat and speed is hardly important these
days, but I think an early exit from the loop on first failure is
worthwhile. It certainly would be with a long string.

BTW my trick of using the string accumulator (&8100) works because the
LENstring function put the string in there. It is only safe until the
next string keyword and I would never actually use it.
druck
2020-06-01 19:01:57 UTC
Permalink
Post by Steve Drain
There are many ways to skin this cat and speed is hardly important these
days,
It can be if you hit a directory on a file server with many thousand
entries - it certainly lets you know who OS_GBPB's one entry at a time,
and who uses a decent sized buffer!

---druck
Bob Latham
2020-05-26 16:29:24 UTC
Permalink
Post by Kevin Wells
Post by Bob Latham
Okay, found the problem (eventually) with OS_GBPB 9 buffer size!
But if anyone has a good way to test for a lowercase file name I'd
love to hear it.
If it is just the first letter that has to be lower case why not
try for just the first letter by the letter code e.g lower case f
is CHR$(102) while uppercase F is CHR$(70)
Thanks for that and to be honest that would probably do. I just
thought it was odd that there doesn't appear to be a way of being
case sensitive without doing the testing yourself. I might have
expected a flag on the entry to OS_file 17 to say fixed case but it
appears not.

Thanks for the help.

Cheers,

Bob.
--
Bob Latham
Stourbridge, West Midlands
Steve Fryatt
2020-05-26 16:52:02 UTC
Permalink
On 26 May, Bob Latham wrote in message
Thanks for that and to be honest that would probably do. I just thought it
was odd that there doesn't appear to be a way of being case sensitive
without doing the testing yourself. I might have expected a flag on the
entry to OS_file 17 to say fixed case but it appears not.
RISC OS filing systems are not case sensitive, full stop. Create yourself
two files in a HostFS folder with RPCEmu from the host system, using names
like "Text" and "text", and see things go wrong when accessing them in RISC
OS.
--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/
Bob Latham
2020-05-26 17:06:30 UTC
Permalink
Post by Steve Fryatt
RISC OS filing systems are not case sensitive, full stop.
Okay, it was only an idea and a question.

Bob.
--
Bob Latham
Stourbridge, West Midlands
John Williams (News)
2020-05-26 17:00:00 UTC
Permalink
I might have expected a flag on the entry to OS_file 17 to say fixed case
but it appears not.
Is not, and has the filer not always been, famously case agnostic?

And as a consequence, isn't your expectation above a bit unreasonable?

John
John Williams (News)
2020-05-26 17:07:41 UTC
Permalink
Is there nothing in the file content you could work with rather than this
name-case business?

John
Bob Latham
2020-05-26 18:40:03 UTC
Permalink
Post by John Williams (News)
I might have expected a flag on the entry to OS_file 17 to say
fixed case but it appears not.
Is not, and has the filer not always been, famously case agnostic?
I can't say it has ever been high on my thoughts so not that famous.
Post by John Williams (News)
And as a consequence, isn't your expectation above a bit
unreasonable?
"Unreasonable"

Of course yes, how nice of you to point it out.

Bob.
Post by John Williams (News)
John
--
Bob Latham
Stourbridge, West Midlands
David Higton
2020-05-26 17:08:26 UTC
Permalink
Post by Bob Latham
But if anyone has a good way to test for a lowercase file name I'd
love to hear it.
RISC OS filing systems are case insensitive. The only way you can do
what you want is to iterate through the filenames, and do whatever
test you want on each filename returned.

David
Bob Latham
2020-05-26 18:40:53 UTC
Permalink
Post by David Higton
Post by Bob Latham
But if anyone has a good way to test for a lowercase file name I'd
love to hear it.
RISC OS filing systems are case insensitive. The only way you can
do what you want is to iterate through the filenames, and do
whatever test you want on each filename returned.
Thank you David.

Cheers,

Bob.
--
Bob Latham
Stourbridge, West Midlands
Erik G
2020-06-01 01:19:56 UTC
Permalink
A general afterthought about the efficiency (speed wise) of searching
a directory tree.
Post by Bob Latham
Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.
I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file
'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.
OS_File 17 does not appear to be case sensitive.
The only way I can see is to read the contents of the directory using
OS_GBPB 9 and wildcards and then test the characters for lower case.
I'm thinking that may be a little slow when doing thousands and I'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.
(NOTE: it has been a long time since I studied the internals of
ADFS. Specific efficiency details of SWI calls such as OS_FILE and
OS_GBPB will have significant effect on the real runtime of any
such program. Read documentation and experiment to find the best
solution)

== In short, the thing I want to impress on all programmers is this:

To make any algorithm involving disk I/O fast, the focus needs to be
on:
- Making as few reads as possible
- Reading as much data in one operation as possible

Also:
- Don't spend much effort optimising the processing of the data by
the CPU, as the disk I/O will dominate the time the algorithm takes
to complete.

This example case of searching through a directory tree involves
reading several (or a lot) of directories and processing the
information with a program.
By far the most time-consuming part of this is the physical reading
of the information from a disk.
Reading one block of data requires:
1) moving the disk head to the correct track
2) waiting for the disk to rotate to the sector that contains the block
3) reading the magnetic information from the disk and transferring it
to memory.

Of these, steps 1 and 2 take up the most time, in the order of
milliseconds.

By comparison, you can do tons of CPU processing in a few milliseconds.

Note that reading several blocks in a row on the same track
returns more data, but only requires one head move (step 1) and one
wait (step 2).
Also note that continuing to read from the next track only needs
a very short (and thus quick) head move, while the wait time can be
practically eliminated by organising the disk in such a way that the
next block to read on this next track shows up just as the head has
settled in its new position.

So in the case of traversing a directory structure, it would be much
more efficient to read an entire directory on one go and then
process the data in memory (e.g. searching for a file that matches
a certain name or pattern), than it would be to ask for the first
directory entry, process it, then ask for the second entry, process
it, etcetera.

My advice for this particular program is to find the best combination
of SWI calls to get a good I/O performance.

In a more general sense it is a lot more efficient to read one big file
with all the data in it rather than have that data spread over lots
of small files. (For example: the game Kerbal Space Program used to
have every detail of the game in a separate file, taking up tens of
thousands of files.
It took several minutes to load. In recent versions many of
those files have been combined into a smaller number of bigger files,
and now the program loads in under a minute.)

And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files,
using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.
--
Erik G.
From address is fake
See http://erikgrnh.home.xs4all.nl/
Bob Latham
2020-06-01 14:53:53 UTC
Permalink
Post by Erik G
And finally: developers of filing systems have worked for decades
to optimise the finding, reading, writing, extending and deletion
of files, using every trick in the book and inventing new ones,
because disk I/O is one of the major bottlenecks in the speed at
which programs run.
Thank you for an interesting read.

In my case I'm checking for various things in a music library stored
on a Synology DS214+. My program written in assembler, uses Lanman98
to access the NAS which was quite a bit faster than moonfish.

The program examines every album and checks for images, file types,
and tags. On flac albums (all 3390 of them), for every track on every
album the file is opened and the tagging checked and then the file is
closed again.

The program then gives a report on any none conformity to various
parameters set.

It varies slightly from run to run but it takes about 14 minutes and
20 seconds to complete. I'm impressed with the speed.

Thanks again.

Bob.
--
Bob Latham
Stourbridge, West Midlands
druck
2020-06-01 19:57:46 UTC
Permalink
Post by Erik G
And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files,
using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.
Unfortunately except on RISC OS, where no use is made of free memory to
cache filing system operations, as just about every other common OS does.

The closest RISC OS comes is some fixed size buffering an ADFS, which
often resulted in the Risc PC's slow motherboard IDE interface
outperforming much better 3rd party IDE hardware using IDEFS variants
with no caching.

---druck
j***@mdfs.net
2020-06-04 16:23:40 UTC
Permalink
Similarly, if there's some I/O information that won't change over the
run of a program, read it once into a variable, then access the variable.
For example:
size%=EXT#inputfile then use size% instead of EXT#
If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

etc.
Martin
2020-06-04 16:51:27 UTC
Permalink
On 04 Jun in article
Post by j***@mdfs.net
Similarly, if there's some I/O information that won't change over
the run of a program, read it once into a variable, then access the
variable.
size%=EXT#inputfile then use size% instead of EXT#
Excellent advice, in general ... but this example ...
Post by j***@mdfs.net
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%
is a bad one, because if it is a Wimp program the mode is usually
changed outside your program, so ModeChange messages have to be
watched for and the relevant variables read again.
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
druck
2020-06-04 19:49:53 UTC
Permalink
Post by j***@mdfs.net
Similarly, if there's some I/O information that won't change over the
run of a program, read it once into a variable, then access the variable.
For example: > size%=EXT#inputfile then use size% instead of EXT#
Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.

1) Obviously if its a Wimp application, other tasks are running
2) If the single tasking program can be run a in taskwindow or graphic
taskwindow, other tasks are running
3) If the file is on a remote filing system, other machines may alter it
4) If the file is on a local filing system which is shared, other
machines may alter it.

So only if you are outside the desktop, and storage is on a local non
shared disc, can you be sure it wont be altered by anything else.
Post by j***@mdfs.net
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%
Only if its running outside the desktop. Inside the desktop the mode can
change, so you need to ensure you handle the mode change message an
re-read any mode related parameters you are using.

---druck
j***@mdfs.net
2020-06-04 23:18:24 UTC
Permalink
Post by druck
For example: size%=EXT#inputfile then use size% instead of EXT#
Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.
If it's open for input, other processes *can't* alter it.
Read By Many, Write By One.
Post by druck
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%
Only if its running outside the desktop. Inside the desktop the mode can
change, so you need to ensure you handle the mode change message an
re-read any mode related parameters you are using.
Which is why I wrote 'your program is never going to change screen
mode'. Maybe it should have been 'where the screen mode is never
going to be changed during the execution of the program'. Such as
a command line tool or a single-taking application.
druck
2020-06-05 10:27:33 UTC
Permalink
Post by j***@mdfs.net
Post by druck
For example: size%=EXT#inputfile then use size% instead of EXT#
Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.
If it's open for input, other processes *can't* alter it.
Read By Many, Write By One.
It's down to the implementation of the filing system to whether that is
true. Local filing systems will tend to lock on write, remote ones will
tend not to. It's a bit of a mine field!

---druck

Loading...