# This files contains examples and an explanation for the RULESFILE / RULE # feature. # # Rules for Lynx are experimental. They provide a rudimentary capability # for URL rejection and substitution based on string matching. # Most users and most installations will not need this feature, it is here # in case you find it useful. Note that this may change or go away in # future releases of Lynx; if you find it useful, consider describing your # use of it in a message to . # # Syntax: # ======= # As you may have guessed, comments are introduced by a '#' character. # Rules have the general form # Operator Operand1 [Operand2] # with words separated by whitespace. # # Recognized operators are # # Fail URL1 # Reject access to this URL, stop processing further rules. # # Map URL1 URL2 # Change the URL to URL2, then continue processing. # # Pass URL1 [URL2] # Accept this URL and stop processing further rules; if URL2 # is given, apply this as the last mapping. # # Rules are processed sequentially first to last, a rule applies # if the current URL (for the resource the user is trying to access) # matches URL1. case-sensitive (!) string comparison is used, in addition # URL1 can contain one '*' which is interpreted as a wildcard matching # 0 or more characters. So if for example # "http://example.com/dir/doc.html" is requested, it would matches any of # the following: # Pass http:* # Pass http://example.com/*.html # Pass http://example.com/* # Pass http://example* # Pass http://*/doc.html # but not: # Pass http://example/* # Pass http://Example.COM/dir/doc.html # Pass http://Example.COM/* # # If a URL2 is given and also contains a '*', that character will be # replaced by whatever matched in URL1. Processing stops with the # first matching "Fail" or "Pass" or when the end of the rules is reached. # If the end is reached without a "Fail" or "Pass", the URL is allowed # (equivalent to a final "Pass *"). # # The requested URL will have been transformed to Lynx's normal # representation. This means that local file resources should be # expected in the form "file://localhost/", # not in the machine's native representation for filenames. # # Anyone with experience configuring the venerable CERN httpd server will # recognize the syntax - in fact, the code implementing rules goes back # to a common ancestor. But note the differences: all URLs and URL- # patterns here have to be given as absolute URLs, even for local files. # (Absolute URLs don't imply proxying - you cannot control that from here.) # # CAVEAT # ====== # First, to squash any false expectations, and example for what NOT TO DO. # It might be expected that a rule like # Fail file://localhost/etc/passwd # <- DON'T RELY ON THIS # could be used to prevent access to the file "/etc/passwd". This might # fool a naive user, but the more sophisticated user could still gain # access, by experimenting with other forms like (@@@ untested) # "file:///etc/passwd" or "/etc//passwd" # or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on. # There are many URL forms for accessing the same resource, and Lynx # just doesn't guarantee that URLs for the same resource will look the # same way. # # The same reservation applies to any attempts to block access to unwanted # sites and so on. This isn't the right place for implementing it. # (Lynx has a number of mechanisms documented elsewhere to restrict access, # see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.) # # Some more useful applications: # # 1. Disabling URLs by access scheme # ---------------------------------- # Fail gopher:* # Fail finger:* # Fail lynxcgi:* # Fail LYNXIMGMAP:* # This should work (but no guarantees) because Lynx canonicalizes # the case of recognized access schemes and does not interpret # %-escaping in the scheme part (@@@ always?) # # Note that for many access schemes Lynx already has mechanisms to # restrict access (see lynx.cfg, -help, -restrictions, etc.), others # have to be specifically enabled. Those mechanisms should be used # in preference. # Note especially Limitation 1 below. # This can be used for the remaining cases, or in addition by the # more paranoid. Note that disabling "file:*" will also make many # of the special pages generated by lynx as temporary files (INFO, # history, ...) inaccessible, on the other hand it doesn't prevent # _writing_ of various temp files - probably not what you want. # # You could also direct access for a scheme to a brief text explaining # why it's not available: # Map news:* http://localhost/texts/newsserver-is-broken.html # (That text shouldn't contain any relative links, they would be # broken.) # # 2. Preventing accidental access # ------------------------------- # If there is a page or site you don't want to access for whatever # reason (say there's a link to it that crashes Lynx [don't forget to # report a bug], or it that starts sending you a 5 Mb file you don't # want, or you just don't like the people...), you can prevent yourself # from accidentally accessing it: # Fail http://bad.site.com/* # # 3. Compressed files # ------------------- # You have downloaded a bunch of HTML documents, and compressed them # to save space. Then you discover that links between the files don't # work, because they all use the names of the uncompressed files. The # following kind of rule will alow you to navigate, invisibly accessing # the compressed files: # Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz # # 4. Use local copies # ------------------- # You have downloaded a tree of HTML documents, but there are many links # between them that still point to the remote location. You want to access # the local copies instead, after all that's why you downloaded them. You # could start editing the HTML, but the following might be simpler: # Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html # Or even combine this with compressing the files: # Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz # # 5. Broken links etc. # -------------------- # A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john, # or http://www.provider.com/company/ has moved to their own server # http://www.company.com, but there are still links to the old location # all over the place; they now are broken or lead to a stupid "this page # has moved, please update your bookmarks. Refresh in 5 seconds" page # which you're tired of seeing. This will not fix your bookmarks, and # it will let you see the outdated URLs for longer (Limitation 3 below), # but for a quick fix: # Map http://www.siteA.com/~jdoe/* http://siteB.org/john/* # Map http://www.provider.com/company/* http://www.company.com/* # But note that you are likely to create invalid links if not all documents # from a site are mapped (Limitation 3). # # 6. DNS troubles # --------------- # A special case of broken links. If a site is inaccessible because the # name cannot be resolved (your or their name server is broken, or the # name registry once again made a mistake, or they really didn't pay in # time...) but you still somehow know the address; or if name lookups are # just too slow: # Map http://www.somesite.com/* http://10.1.2.3/* # (You could do the equivalent more cleanly by adding an entry to the hosts # file, if you have access to it.) # # Or, if a name resolves to several addresses of which one is down, and the # DNS hasn't caught up: # Map http://www.w3.org/* http://www12.w3.org/* # # Note that this can break access to some name-based virtually hosted sites. # Limitations # =========== # First, see CAVEAT above. There are other limitations: # # 1. Applicable URL schemes # ------------------------- # Rules processing does not apply to all URL schemes. Some are # handled differently from the generic access code, therefore rules # for such URLs will never be "seen". This limitation applies at # least to lynxexec:, lynxprog:, mailto:, and LYNXHIST: URLs. # # Also, a scheme has to be known to Lynx in order to get as far as # applying rules - you cannot just define your own new foobar: scheme # and then map it to something here. # # 2. No re-checking # ----------------- # When a URL is mapped to a different one, the new URL is not checked # again for compliance with most restrictions established by -anonymous, # -restrictions, lynx.cfg and so on. This can be regarded as a feature: # it allows specific exceptions. Of course it means that users for # whom any restrictions must be enforced cannot have write access to a # personal rules file, but that should be obvious anyway! # # 3. Mappings are invisible # ------------------------- # Changing the URL with "Map" or "Pass" rules will in general not be # visible to the user, because it happens at a late stage of processing # a request (similar to directing a request through a proxy). One # can think of two kinds of URL for every resource: a "Document URL" as # the user sees it (on INFO page, history list, status line, etc.), and # a "physical URL" used for the actual access. Rules change only the # physical URL. This is different from the effect of HTTP redirection. # Often this is bad, sometimes it may be desirable. # # Changing the URL can create broken links if a document has relative URLs, # since they are taken to be relative to the "Document URL" (if no BASE tag # is present) when the HTML is parsed. # # 4. Interaction with proxying # ---------------------------- # Rules processing is done after most other access checks, but before # proxy (and gateway) settings are examined. A "Fail" rule works # as expected, but when the URL has been mapped to a different one, # the subsequent proxy checking can get confused. If it decides that # access is through a proxy or gateway, it will generally use the # original URL to construct the "physical" URL, effectively overriding # the mapping rules. If the mapping is to a different access scheme # or hostname, proxy checking could also be fooled to use a proxy when # it shouldn't, to not use one when it should, or (if different proxies # are used for different schemes) to use the wrong proxy. So "just # don't do that"; in some cases setting the no_proxy variable will help. # Example 3 happens to work nicely if there is a http_proxy but no # ftp_proxy.