Discussion:
Memory corruption problem
(too old to reply)
Alan Adams
2018-11-16 13:15:03 UTC
Permalink
Hi

I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems to
be corrupting system memory. The symptoms are that at apparently
random times I get "file has been closed or handle is invalid", mainly
from my programs but occasionally also from Messenger. Occasionally
also the fonts get changed on the desktop, and shutdown produces "font
not found".

When I get "invalid handle" the handle value is still the one previous
writes to the file have used.

Hermes also occasionally reports a "network problem" which requires a
reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the areas
pointed to, in the same way I suspect the areas referenced by file
handles are being altered.

It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Martin
2018-11-16 15:21:01 UTC
Permalink
Post by Alan Adams
Hi
I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems
to be corrupting system memory. The symptoms are that at
apparently random times I get "file has been closed or handle is
invalid", mainly from my programs but occasionally also from
Messenger. Occasionally also the fonts get changed on the desktop,
and shutdown produces "font not found".
When I get "invalid handle" the handle value is still the one
previous writes to the file have used.
Hermes also occasionally reports a "network problem" which requires
a reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the
areas pointed to, in the same way I suspect the areas referenced
by file handles are being altered.
It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
I cannot think how something that is changed within an application
wimpslot by that application could possibly cause any other
application problems.

However, if you use indirection operators !?$ it is possible that
something that *should* be within your wimpslot is not. It is
possible to validate write addresses?

If your applications use memory that is outside your slot (probably
RMA?) then overrunning a block, or writing to the wrong address, can
have predictable unpredictable results. Again, is it possible to
validate writes?

If an RMA block has more written than its size, you are likely to get
Heap errors at some time, but it depends on what and how much is
overwritten. Note that *ReportHeap always checks the validity of the
heap and reports any problems with the chains (see example HeapCheck
program). This can give some clues about an overrun problem before it
causes chaos.

The smaller the overwrite, the harder it can be to track down - but
yours sound larger to me. However, remember that they could all be
caused by spurious SYS calls as well.

Martin
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
Alan Adams
2018-11-16 15:53:45 UTC
Permalink
Post by Martin
Post by Alan Adams
Hi
I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems
to be corrupting system memory. The symptoms are that at
apparently random times I get "file has been closed or handle is
invalid", mainly from my programs but occasionally also from
Messenger. Occasionally also the fonts get changed on the desktop,
and shutdown produces "font not found".
When I get "invalid handle" the handle value is still the one
previous writes to the file have used.
Hermes also occasionally reports a "network problem" which requires
a reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the
areas pointed to, in the same way I suspect the areas referenced
by file handles are being altered.
It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
I cannot think how something that is changed within an application
wimpslot by that application could possibly cause any other
application problems.
However, if you use indirection operators !?$ it is possible that
something that *should* be within your wimpslot is not. It is
possible to validate write addresses?
There are a huge number of indirection operaions, not least among the
ring buffers used in the network communications. My suspicion is that
somewhere among them a completely invalid address is being produced.
Post by Martin
If your applications use memory that is outside your slot (probably
RMA?) then overrunning a block, or writing to the wrong address, can
have predictable unpredictable results. Again, is it possible to
validate writes?
I'm not using RMA
Post by Martin
If an RMA block has more written than its size, you are likely to get
Heap errors at some time, but it depends on what and how much is
overwritten. Note that *ReportHeap always checks the validity of the
heap and reports any problems with the chains (see example HeapCheck
program). This can give some clues about an overrun problem before it
causes chaos.
The smaller the overwrite, the harder it can be to track down - but
yours sound larger to me. However, remember that they could all be
caused by spurious SYS calls as well.
I'm wondering whether a high-vector build would help. If the disk and
socket data areas are protected, it would catch such invalid writes.
I'm on an ARMX6, and I don't know immediately whether there is a
high-vector build available. I seem to recall that there is.
Post by Martin
Martin
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Martin
2018-11-16 20:04:23 UTC
Permalink
[Snip]
Post by Alan Adams
Post by Martin
However, if you use indirection operators !?$ it is possible that
something that *should* be within your wimpslot is not. It is
possible to validate write addresses?
There are a huge number of indirection operaions, not least among
the ring buffers used in the network communications. My suspicion
is that somewhere among them a completely invalid address is being
produced.
Post by Martin
If your applications use memory that is outside your slot
(probably RMA?) then overrunning a block, or writing to the wrong
address, can have predictable unpredictable results. Again, is it
possible to validate writes?
I'm not using RMA
Where is the memory you *should* be using?
Are you using lots of individually allocated blocks?
Or are you using your own Heap to manage blocks?

If there are a multiplicity of writes that cannot be verified, can
you add a guard word at the end of each block, and check they are
valid as often as possible?
Post by Alan Adams
I'm wondering whether a high-vector build would help. If the disk
and socket data areas are protected, it would catch such invalid
writes. I'm on an ARMX6, and I don't know immediately whether
there is a high-vector build available. I seem to recall that
there is.
Not sure if one is available ... but anyway, the HV build only
protects what was in 'Zero page' which probably excludes disk &
socket areas.
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
Alan Adams
2018-11-16 22:38:10 UTC
Permalink
Post by Martin
[Snip]
Post by Alan Adams
Post by Martin
However, if you use indirection operators !?$ it is possible that
something that *should* be within your wimpslot is not. It is
possible to validate write addresses?
There are a huge number of indirection operaions, not least among
the ring buffers used in the network communications. My suspicion
is that somewhere among them a completely invalid address is being
produced.
Post by Martin
If your applications use memory that is outside your slot
(probably RMA?) then overrunning a block, or writing to the wrong
address, can have predictable unpredictable results. Again, is it
possible to validate writes?
I'm not using RMA
Where is the memory you *should* be using?
Are you using lots of individually allocated blocks?
Or are you using your own Heap to manage blocks?
Individual blocks
Post by Martin
If there are a multiplicity of writes that cannot be verified, can
you add a guard word at the end of each block, and check they are
valid as often as possible?
I'm doing that for the ring buffers, to guard against overruns.
However I don't think this is an overrun, more likely a miscalculated
pointer, which could be absolutely anywhere in the address space.

It would be nice to have proper memory protection - I grew up with VMS
which would produce "access violation" when this happened, and a stack
dump to show you how you got to it. Unfortunately putting reportstack
in the error handler is useless, because the call stack has already
gone.
Post by Martin
Post by Alan Adams
I'm wondering whether a high-vector build would help. If the disk
and socket data areas are protected, it would catch such invalid
writes. I'm on an ARMX6, and I don't know immediately whether
there is a high-vector build available. I seem to recall that
there is.
Not sure if one is available ... but anyway, the HV build only
protects what was in 'Zero page' which probably excludes disk &
socket areas.
I've upgraded to 5.25, which made no difference. I don't think it's
high-vector though - how can I tell?
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Martin
2018-11-16 23:46:15 UTC
Permalink
[Snip]
Post by Alan Adams
Post by Martin
If there are a multiplicity of writes that cannot be verified, can
you add a guard word at the end of each block, and check they are
valid as often as possible?
I'm doing that for the ring buffers, to guard against overruns.
However I don't think this is an overrun, more likely a
miscalculated pointer, which could be absolutely anywhere in the
address space.
All I can suggest is to write a simple PROC to validate a pointer as
between TOP and HIMEM. Then add calls before your most likely pointer
writes, until you have either done them all or found a culprit.
Tedious, unless anyone can think of a better way!
Post by Alan Adams
It would be nice to have proper memory protection - I grew up with
VMS which would produce "access violation" when this happened, and
a stack dump to show you how you got to it. Unfortunately putting
reportstack in the error handler is useless, because the call
stack has already gone.
It is one of the limitations of RISC OS, and BASIC in particular.
I was used to MVS protection and debugging facilities!
Post by Alan Adams
I've upgraded to 5.25, which made no difference. I don't think it's
high-vector though - how can I tell?
!ScrHelp from Chris Hall will tell you ... and there are probably
other ways as well.

Martin
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
David Buck
2018-11-17 07:15:37 UTC
Permalink
Post by Alan Adams
Hi
I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems to
be corrupting system memory. The symptoms are that at apparently
random times I get "file has been closed or handle is invalid", mainly
from my programs but occasionally also from Messenger. Occasionally
also the fonts get changed on the desktop, and shutdown produces "font
not found".
When I get "invalid handle" the handle value is still the one previous
writes to the file have used.
Hermes also occasionally reports a "network problem" which requires a
reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the areas
pointed to, in the same way I suspect the areas referenced by file
handles are being altered.
It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
--
Alan Adams, from Northamptonshire
http://www.nckc.org.uk/
Sounds like a classic CLOSE#0 occurring. This used to be my downfall, and caused the desktop font to change also. Make sure a handle is valid before closing it, some thing like...

If handle%>0 THEN CLOSE#handle%

HTH
Alan Adams
2018-11-17 12:21:28 UTC
Permalink
Post by David Buck
Post by Alan Adams
Hi
I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems to
be corrupting system memory. The symptoms are that at apparently
random times I get "file has been closed or handle is invalid", mainly
from my programs but occasionally also from Messenger. Occasionally
also the fonts get changed on the desktop, and shutdown produces "font
not found".
When I get "invalid handle" the handle value is still the one previous
writes to the file have used.
Hermes also occasionally reports a "network problem" which requires a
reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the areas
pointed to, in the same way I suspect the areas referenced by file
handles are being altered.
It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
--
Alan Adams, from Northamptonshire
http://www.nckc.org.uk/
Sounds like a classic CLOSE#0 occurring. This used to be my downfall,
and caused the desktop font to change also. Make sure a handle is
valid before closing it, some thing like...
If handle%>0 THEN CLOSE#handle%
HTH
Thanks. That could be it.

I think a PROCclosefile is required, and change all the closes to use
it.
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Alan Adams
2018-11-17 13:10:11 UTC
Permalink
Post by Alan Adams
Post by David Buck
Post by Alan Adams
Hi
I have a huge BASIC project which uses TCPIP networking. I am
currently making some changes, and have hit a snag. Something seems to
be corrupting system memory. The symptoms are that at apparently
random times I get "file has been closed or handle is invalid", mainly
from my programs but occasionally also from Messenger. Occasionally
also the fonts get changed on the desktop, and shutdown produces "font
not found".
When I get "invalid handle" the handle value is still the one previous
writes to the file have used.
Hermes also occasionally reports a "network problem" which requires a
reboot. While this might be due to runaway socket allocation,
exhausting the supply, it also might indicate corruption of the areas
pointed to, in the same way I suspect the areas referenced by file
handles are being altered.
It would help if I knew what areas of memory to look at after the
event - the things overwritten there (if that is indeed what is
happening) may tell me what part of my code is getting it wrong.
--
Alan Adams, from Northamptonshire
http://www.nckc.org.uk/
Sounds like a classic CLOSE#0 occurring. This used to be my downfall,
and caused the desktop font to change also. Make sure a handle is
valid before closing it, some thing like...
If handle%>0 THEN CLOSE#handle%
HTH
Thanks. That could be it.
I think a PROCclosefile is required, and change all the closes to use
it.
That was it. The following helped track down the problem:

DEF PROCclosefile(RETURN H%)
IF (DEBUG%AND(1<<21)) AND (H%=0) THEN
*REPORT \R WL: closefile: HANDLE IS ZERO H%
*REPORTSTACK
ENDIF
IF H%<>0 THEN CLOSE#H%:H%=0
ENDPROC

Luckily my guess at the area to change to use this was the area
causing the problem - it was some of the other debugging code.
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Martin
2018-11-17 19:01:12 UTC
Permalink
Post by Alan Adams
Post by Alan Adams
I think a PROCclosefile is required, and change all the closes to
use it.
That was it.
I am glad if that solved your problem.
Has it also resolved the network & socket issues?

Martin
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
Alan Adams
2018-11-17 21:04:24 UTC
Permalink
Post by Martin
Post by Alan Adams
Post by Alan Adams
I think a PROCclosefile is required, and change all the closes to
use it.
That was it.
I am glad if that solved your problem.
Has it also resolved the network & socket issues?
Martin
I can't see why it would have, but so far they haven't appeared. I may
have fixed some of them a few days ago - what I'm doing is trying to make
the system more fault-tolerant, so if a connection drops, it is
automatically reconnected. At one point it was dropping and reconnecting
around once a second, due to a misplaced line of code. I think this caused
socket exhaustion - I could be wrong though.

Now I am just trying to find out why the traffic isn't resuming properly
on the re-established connection. It's getting there, though.
--
Alan Adams, from Northamptonshire
***@adamshome.org.uk
http://www.nckc.org.uk/
Erik G
2018-11-24 19:17:30 UTC
Permalink
On 17/11/2018 14:10, Alan Adams wrote:

[snip: CLOSE#0 bug]
Post by Alan Adams
Luckily my guess at the area to change to use this was the area
causing the problem - it was some of the other debugging code.
That would make it a rebug :-)

Andy Weir mentions this type of bug in his Casey & Andy comic:
<http://www.galactanet.com/comic/view.php?strip=289>
--
Erik G.
From address is fake
John Williams (News)
2018-11-17 12:26:11 UTC
Permalink
Post by David Buck
Sounds like a classic CLOSE#0 occurring. This used to be my downfall, and
caused the desktop font to change also. Make sure a handle is valid
before closing it, some thing like...
If handle%>0 THEN CLOSE#handle%
: handle%=0

John
--
John Williams, now back in the UK - no attachments to these addresses!
Non-RISC OS posters change user to johnrwilliams or put 'risc' in subject!
Who is John Williams? http://petit.four.free.fr/picindex/author/
David Higton
2018-11-17 20:59:01 UTC
Permalink
Post by David Buck
Sounds like a classic CLOSE#0 occurring. This used to be my downfall, and
caused the desktop font to change also. Make sure a handle is valid before
closing it, some thing like...
If handle%>0 THEN CLOSE#handle%
Here's my offering, in the hopes that it may help someone.

I make it a golden rule to only close files by means of a PROC, never
anywhere else. I have two styles, one to close a specific file that
is built into the PROC, and another one that closes a file whose
handle is passed in.

DEF PROCclose_input_file
REM Closes file handle input_file%
LOCAL ERROR
ON ERROR LOCAL input_file% = 0
IF input_file% <> 0 THEN
CLOSE#input_file%
input_file% = 0
ENDIF
ENDPROC

DEF PROCclose_file(RETURN h%)
REM Closes file handle h%
LOCAL ERROR
ON ERROR LOCAL h% = 0
IF h% <> 0 THEN
CLOSE#h%
h% = 0
ENDIF
ENDPROC

The above typed from memory, so E&OE.

I do a similar thing for sockets. The difference is that socket number
0 is valid, and a closed socket is -1, so you can work out the required
changes.

Dave
Loading...