Copyright (c) 2002 Hauke Dämpfling, version 1.1 / 13.5.2002, http://www.zero-g.net/gwebcache/
(ripped with many thanks from Info
Anarchy's summary)
The goal of the "Gnutella Web Caching System" (the "cache")
is to eliminate the "Initial Connection Point Problem" of a fully
decentralized network: Where do I find a first host to connect to? The cache is
a program (script) placed on any web server that stores IP addresses of hosts in
the Gnutella network and URLs of other caches. Gnutella clients connect to a
cache in their list randomly. They send and receive IP addresses and URLs from
the cache. With the randomized connection it is to be assured that all caches
eventually learn about each other, and that all caches have relatively fresh
hosts and URLs. The concept is independent from Gnutella clients.
Interaction with the web server and cache is a series of HTTP GET requests and
responses. Support for POST requests is optional and not necessary. The
following specifications describe the GET requests and the expected responses,
as well as the expected behavior of the script and the client. The notation url?query
indicates the URLs of a script with the attached query string, where "query
"
is a series of name=value
pairs. These name/value pairs must be
"URL-Encoded", as is described (for example) here,
or in RFC1738. Due to the
differences between operating systems, responses can be LF, CRLF, or
CR-terminated, but should be of Content-Type "text/*". Responses are
interpreted line-by-line.
Tip: GET requests are easier than they may sound above: the query (the
information/request you are sending the script) is simply part of the URL. For
example, let's say the the request is: url?ip1=192.168.0.1:123
, you
will simply have to open the following URL using whatever web functions your
programming language provides:
http://www.somehost.com/path/to/script.php?ip1=192.168.0.1:123
The only tricky parts are: one, the "URL-Encoding" - your best bet is
to go look for such functions, they have often already been written by someone
and maybe already are part of your libraries. Second, interpreting the
end-of-line characters in the responses - again, often there are already
functions in the libraries that you can use to read responses line-by-line,
taking the end-of-line characters into account.
Clients generally keep an internal cache of the IP addresses of known Gnutella nodes. In addition to this list, they should also keep an internal list of web caches. When making requests, a client should pick a cache from its internal list (a different one every time). Clients should remove invalid nodes and URLs from their internal caches. Doing this in combination with regular update requests will keep the integrity of the "network" of web caches. How this works is: URLs of scripts that are non-functional will (should) not be submitted to the functional caches by clients. In case a cache still has a URL of a non-functional script, it will soon be "phased out" by the regular update requests.
Of course, all developers should take care that the interactions as described here are strictly followed and should never release scripts or clients without proper in-house testing first, as to not disrupt the integrity of the network.
Security Note: Clients and/or scripts may not be able to verify that nodes and URLs in their caches are valid. Clients and scripts should therefore have security measures in place against possible errors in scripts/clients, mischief, or DoS (Denial of Service) attacks. Examples include: verification of URLs by sending Ping requests, automatic removal of caches that may be behaving "strangely" (errors/invalid caches and hosts), limiting the time between requests for a single client.
The client wishes to receive a list of Gnutella nodes. | |
Request: | url?hostfile=1 |
Response: | A return-separated list of Gnutella nodes in the format
"ip:port" (numerical IPs only). The list should not be very long
(around 20 nodes) and should contain only the newest entries. OR A redirect (HTTP code 3xx) response, indicating that the client needs to send another HTTP GET request for the file. Clients must support this method. Luckily, many standard HTTP libraries automatically follow redirects. When a client follows the redirect, it should receive a list as described above. OR The string " ERROR ", possibly followed by more
specific error information. |
Client-Side: | A client should send this request whenever it needs hosts
to connect to. Clients should be able to handle variable sizes of lists in
responses, including empty responses. Clients should remove web caches
from their internal lists in case the caches return ERROR
messages (or fail to respond correctly altogether) more than a few times
in a row. |
Server-Side: | As noted above, scripts need not and should not return many
hosts (only ~20). See the comment on the ERROR response in
the notes for the "Update" request below. |
The client wishes to receive a list of alternate web cache URLs. | |
Request: | url?urlfile=1 |
Response: | A return-separated list of alternate web caches' URLs. The
list should not be very long (around 20 URLs) and should contain only
the newest entries. OR A redirect (HTTP code 3xx) response, indicating that the client needs to send another HTTP GET request for the file. Clients must support this method. Luckily, many standard HTTP libraries automatically follow redirects. When a client follows the redirect, it should receive a list as described above. OR The string " ERROR ", possibly followed by more
specific error information. |
Client-Side: | A client should send this request to build its internal
list of caches (such as once on start up). Clients should be able to
handle variable sizes of lists in responses, including empty responses.
Clients should remove web caches from their internal lists in case the
caches return ERROR messages (or fail to respond correctly
altogether) more than a few times in a row. |
Server-Side: | As noted above, scripts need not and should not return many
URLs (only ~20). See the comment on the ERROR response in the
notes for the "Update" request below. |
The client wishes to update IP addresses and/or alternate web cache URLs to a cache. | |
Request: | url?ip1=XXX.XXX.XXX.XXX:PORT&url1=http://WWW.SOMEHOST.COM/PATH/TO/SCRIPT&ip2= ...&url2= ...(Reminder: Requests need to be URL-Encoded - see "Basics") |
Response: | First line must be: either "OK "
or "ERROR ", or "ERROR: Message ".Following lines: can be ignored by the client, can be used by the script for warning messages. Note: These two basic responses let the client know that the script is functional (to a certain extent). In other words: if anything else is returned by the web server (for example, if the response begins with <HTML> ),
this can be interpreted as a server error of some sort. |
Client-Side: | A client should send this request periodically
(~every hour). For best efficiency, a client should submit only its own IP
address and one alternate web cache when it updates. Clients should only
send the URLs of web caches that they know to be functional! Clients can handle the responses silently - however clients should remove web caches from their internal lists in case the caches return ERROR
messages (or fail to respond correctly altogether) more than a few times
in a row. |
Server-Side: | An OK message usually means that
everything went well and the script executed normally. An ERROR
message usually indicates some form of fatal error because of which the
script could not do what is is supposed to. Since clients will (should)
remove scripts that return error messages often, it is advised to return ERROR s
only when the script is expected to be down for a while (such as, the
script will be or has been removed from server, server overload, file
errors, etc.).Since scripts need to only return a few and only the newest Hosts and URLs, the oldest entries should simply be removed when new entries are submitted through an update request. Scripts may whish to check the validity of submitted URLs by sending a Ping request, but this is not required. |
A ping/pong scheme to verify that caches are active. | |
Request: | url?ping=1 |
Response: | The first four characters of the response are: PONG ,
followed by a version number string (can be omitted). |
Client-Side: | This system can and should be used to verify that a URL is
valid and that a script is functioning correctly. Note: Some scripts, when installed by users on their servers, may return pings correctly but fail on other requests (mostly due to file access errors and the like), so verification is not always 100% guaranteed. |
Server-Side: | ditto |
Other responses that a script can send include HTML information pages,
statistics, etc. For example, if no request is sent to the script (i.e. the
script is simply browsed to), it could display a page informing the user that
"this is a Gnutella web cache" or something similar. Or, one could
include an extra request, "url?stats=1
", which could
display a HTML page with some statistics.
In general, script authors can include any extensions they wish, as long as the interaction described above remains unchanged. Clients need not implement any extensions, since the basic interactions will be the same.
Statistics are regularly collected on all known GWebCache scripts. If the author of a script would like to make statistics from their script available, the following request should be implemented.
Request: | url?statfile=1 |
Response: | Line 1: Total number of requests received. Line 2: Requests received in the last full hour. |
v1.1
- Suggested client and server-side behavior more specific.
- Added suggested statistics response.
v1.0
- First release.
GWebCache Home
http://www.zero-g.net/gwebcache/
Copyright (c) 2002 Hauke
Dämpfling. Licensed under FDL.
See also: http://www.gnucleus.net/