Discussion:
Accessing a password-protected page via wget
(too old to reply)
Harriet Bazley
2022-01-21 13:43:58 UTC
Permalink
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....


I don't really know how to use the relevant features of
wget and have been flailing around rather at random. Simply using

wget --ask-password STATS_URL

doesn't produce the desired result; it prompts for the password all
right, but when I supply it the fetch then gets redirected to retrieve
the log-in page instead, just as if I had supplied no password or the
wrong one.

wget --user=USERNAME --ask-password STATS_URL

prompts "Password for user" instead of just "Password", but still
doesn't seem to pass the required data.

Same result from

wget --user=USERNAME --password=PASS STATS_URL

(the retrieved page states 'sorry, you don't have access to view the
page you were trying to reach, please log in')


After looking for advice on the Web I tried fetching the log-in page
directly using the same methods and using --keep-session-cookies before
running a second command to fetch the stats page immediately afterwards,
but that didn't work. It fetches the login page, then redirects and
fetches it again under a different name, the only difference being the
error:

<div class="flash error">Sorry, you don&#39;t have permission to access the page you were trying to reach. Please log in.</div>


I then tried using --save-cookies followed by --load-cookies for the
second request, but that didn't work, doubtless because the resulting
'cookies' file had no content:

# HTTP cookie file.
# Generated by Wget on 2022-01-21 13:37:33.
# Edit at your own risk.


I then tried

wget --post-data 'user_login=USERNAME&user_password=PASS' LOGIN_URL

where the relevant form reads

<dt><label for="user_login">User name or email:</label></dt>
<dd><input type="text" name="user[login]" id="user_login"/></dd>
<dt><label for="user_password">Password:</label></dt>
<dd><input type="password" name="user[password]" id="user_password"/></dd>
<dt><label for="user_remember_me">Remember me</label></dt>
<dd><input name="user[remember_me]" type="hidden" value="0"/><input type="checkbox" value="1" name="user[remember_me]" id="user_remember_me"/></dd>
<dt class="landmark">Submit</dt>
<dd class="submit actions">
<input type="submit" name="commit" value="Log in" class="submit"/>
</dd>

but still had no luck.

I'm simply not managing to submit the name/password combination in any
way that the site will acknowledge.
--
Harriet Bazley == Loyaulte me lie ==

We are not punished for our sins, but by them.
Harriet Bazley
2022-01-21 15:11:42 UTC
Permalink
On 21 Jan 2022 as I do recall,
Post by Harriet Bazley
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....
I suspect this may have something to do with it:

<div id="loginform">
<form class="new_user" id="new_user" action="/users/login" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="&#x2713;"/><input type="hidden" name="authenticity_token" value="VfGGu3jwjsf6xNQmlmuu3Qkgc1BsZzgu0ikhluwqmVHU9RFVQQUUANuaza9HFgXr_c71SiKwBLz8XA8bQ4hSOA"/>

Unfortunately reading and submitting the 'authenticity token' remotely
might be a bit tricky, as I assume it's intended to prevent precisely
that!

I've tried pointing the --load-cookies option at the cookie file from a
logged-in copy of Netsurf, but the cookie format is evidently not
compatible.
--
Harriet Bazley == Loyaulte me lie ==

Lies, damned lies and user documentation.
Kevin Wells
2022-01-21 16:54:24 UTC
Permalink
Post by Harriet Bazley
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....
I don't really know how to use the relevant features of
wget and have been flailing around rather at random. Simply using
If you use the -S option you get the server response, which if used woth
the -o option you can then save it and see what the server is saying.
Post by Harriet Bazley
wget --ask-password STATS_URL
doesn't produce the desired result; it prompts for the password all
right, but when I supply it the fetch then gets redirected to retrieve
the log-in page instead, just as if I had supplied no password or the
wrong one.
wget --user=USERNAME --ask-password STATS_URL
prompts "Password for user" instead of just "Password", but still
doesn't seem to pass the required data.
Same result from
wget --user=USERNAME --password=PASS STATS_URL
(the retrieved page states 'sorry, you don't have access to view the
page you were trying to reach, please log in')
After looking for advice on the Web I tried fetching the log-in page
directly using the same methods and using --keep-session-cookies before
running a second command to fetch the stats page immediately afterwards,
but that didn't work. It fetches the login page, then redirects and
fetches it again under a different name, the only difference being the
<div class="flash error">Sorry, you don&#39;t have permission to access the page you were trying to reach. Please log in.</div>
I then tried using --save-cookies followed by --load-cookies for the
second request, but that didn't work, doubtless because the resulting
# HTTP cookie file.
# Generated by Wget on 2022-01-21 13:37:33.
# Edit at your own risk.
I then tried
wget --post-data 'user_login=USERNAME&user_password=PASS' LOGIN_URL
where the relevant form reads
<dt><label for="user_login">User name or email:</label></dt>
<dd><input type="text" name="user[login]" id="user_login"/></dd>
<dt><label for="user_password">Password:</label></dt>
<dd><input type="password" name="user[password]" id="user_password"/></dd>
<dt><label for="user_remember_me">Remember me</label></dt>
<dd><input name="user[remember_me]" type="hidden" value="0"/><input type="checkbox" value="1" name="user[remember_me]" id="user_remember_me"/></dd>
<dt class="landmark">Submit</dt>
<dd class="submit actions">
<input type="submit" name="commit" value="Log in" class="submit"/>
</dd>
but still had no luck.
I'm simply not managing to submit the name/password combination in any
way that the site will acknowledge.
--
Kev Wells
http://kevsoft.co.uk/ https://ko-fi.com/kevsoft
carpe cervisium
I went into a theatre as sober as could be,
Harriet Bazley
2022-01-21 17:45:16 UTC
Permalink
On 21 Jan 2022 as I do recall,
Post by Kevin Wells
Post by Harriet Bazley
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....
If you use the -S option you get the server response, which if used woth
the -o option you can then save it and see what the server is saying.
Well, that's interesting - it's doing a 'set-cookie', but no cookies are
being stored by wget....


HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx/1.19.6
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
referrer-policy: strict-origin-when-cross-origin
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-download-options: noopen
x-permitted-cross-domain-policies: none
set-cookie: _otwarchive_session=WVRjTGJ4U2dHck5NWXIrRlZXaGhHZmpZSjRxODZCa2hodDRwTWFlQ0VUOVJiVmcxVEtDaGNSaU9XRmthWjNRL3ljeXlnY1dCc0F5Q2pCblZNbE9mWk5obVNreC9PT1JVU2Y1YmY1Rkd1OWVxSVlVYkxnSlVDY3FrMlZrNmhVZVB3QVFRdGVGTU9ETk5ZalFEWGVqeDZudllHYUJ5R3VIUTV4OUU0RkZTVkFpZ1ZBd2E2SDJ2a3JvZkdxbkZJYWpCLS1CeTB2dFpxV2kzbUdWYXFpZGpxbTFBPT0%3D--55a268d9001c5202764dff147620a50e8e226676; path=/; expires=Fri, 04 Feb 2022 17:43:11 GMT; HttpOnly
x-request-id: 484a28ab-af27-47b6-bc77-b96615c72331
x-runtime: 0.029676

[snip]
--
Harriet Bazley == Loyaulte me lie ==

It is better to have loved and lost than just to have lost.
Steve Fryatt
2022-01-21 18:57:36 UTC
Permalink
On 21 Jan, Harriet Bazley wrote in message
Post by Harriet Bazley
Well, that's interesting - it's doing a 'set-cookie', but no cookies are
being stored by wget....
Are you using wget's --save-cookies and --keep-session-cookies options? You
then load then with --load-cookies on subsequent calls.

I /assume/ that you would do this on each call to wget, passing the cookies
from call to call in that way, but I've just skimmed the man page on Linux
so a) I've not tried it, and b) I've no idea how the current version on an
Ubuntu box relates to what we have on RISC OS.

The Google search that led me to the above also mentioned loading in cookies
saved from Firefox as you describe, and suggested that care needs to be
taken so as not to include any cookies from other sites in the process. I'd
suggest that using wget for the whole thing, and not trying to apply cookies
saved from a browser, might be a safer option.
--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/
Harriet Bazley
2022-01-21 20:28:42 UTC
Permalink
On 21 Jan 2022 as I do recall,
Post by Steve Fryatt
On 21 Jan, Harriet Bazley wrote in message
Post by Harriet Bazley
Well, that's interesting - it's doing a 'set-cookie', but no cookies are
being stored by wget....
Are you using wget's --save-cookies and --keep-session-cookies options? You
then load then with --load-cookies on subsequent calls.
Yes -- as I mentioned in my original post, I end up with a blank cookies
file when I use the save-cookies option.

------------------------------------------------------------------------------
# HTTP cookie file. # Generated by Wget on 2022-01-20 22:47:54. #
Edit at your own risk.
druck
2022-01-22 10:25:36 UTC
Permalink
Post by Harriet Bazley
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....
It will be possible to do this with wget (or curl), but as you can see
from the other responses, it involves sprinkling fairy dust over the
correct magic runes. It may be easier to use Python with the requests
module for this, as it can set up auth headers and suchlike.

---druck
Harriet Bazley
2022-01-22 12:17:13 UTC
Permalink
On 22 Jan 2022 as I do recall,
Post by druck
Post by Harriet Bazley
I've been trying to use wget to retrieve a page that is only accessible
to logged-in users (user stats - so that I can analyse them and keep a
running record of changes).
Basically, I can't seem to get the correct syntax for the site to
receive/recognise my name and password in the first place, let alone to
serve up the stats page requested....
It will be possible to do this with wget (or curl), but as you can see
from the other responses, it involves sprinkling fairy dust over the
correct magic runes. It may be easier to use Python with the requests
module for this, as it can set up auth headers and suchlike.
Given that I can *see* the relevant cookies in Netsurf (and can copy
them from the text file they're stored in), it might be easier just to
find the format used by Wget and manually construct a file to be used
via --load-cookies.
If I were actually certain that I've got the cookie-handling sections of
Wget at all working. Can anyone suggest a test page that *should* work
without requiring hidden hashed magic values?
--
Harriet Bazley == Loyaulte me lie ==

What's the point in being grown up if you can't be childish sometimes?
Loading...