Requestors: URL Requestor
Morphis includes the URL Requestor which will, given
a URL to a document, fetch the contents of the document and send it to
a Translator. The supported URL schemes include http, https, ftp and file.
The URL Requestor is useful for translating documents
residing on other servers (via http:// or ftp://), and even on the local
filesystem (file://). This is very useful when obtaining translation source
streams from other web content management systems.
Please Note that Java Secure Socket Extension must be
installed for the https scheme to work. It may be downloaded from the
Java website.
When using the HTTP scheme, the following additional
features are provided:
- Session cookies: cookies set by the
destination server will be temporarily stored by Morphis, and sent back
to the destination server during appropriate requests. This is implemented
by keeping session state with the client. The HTTP proxy requestor does
not currently support persistent cookies set by the destination server.
- Caching: If the stream from the destination
server includes an
Expires: header set in the future, the
HTTP proxy will cache the stream until the expires date is reached.
If the Expires: header is not set, you can force
the page to be cached by setting the requestor-cache-max-age
processing parameter. You may disable caching by setting the url-requestor-cache
processing parameter to false . By default, it is set to
true . The HTTP proxy requestor will also refuse to cache
documents with the Cache-Control: header including no-cache .
- Redirection: An HTTP stream returning
a 3xx status code will automatically be re-retrieved using the
Location:
header.
The URL Requestor may determine the destination site
in two separate ways. First, the dispatch.xml file may define
an href attribute in the context element. The following example shows
how requests for /foo/bar will be proxied to http://remote.server.com/the/file .
<context match="/foo/bar"
matchtype="exact"
requestor="org.morphis.URLRequestor"
href="http://remote.server.com/the/file">
|
The other way is to not include the href attribute in
the context element, but supply it in the requesting URL itself. For example,
the following request to:
http://my.morphis.server/foo/bar?url=http://remote.server.com/the/file
(note: To be correct, the parameter value
for url should be URL encoded)
|
would match the following context in dispatch.xml ,
and would proxy the data from http://remote.server.com/the/file .
<context match="/foo/bar"
matchtype="exact"
requestor="org.morphis.URLRequestor" >
|
The advantage of this method is that links in the returned
document could be parsed, and prepended with http://my.morphis.server/foo/bar?url=.
This would ensure that all links on a page go through Morphis.
Morphis WAX automatically provides this feature when using the URL Requestor
in this manner.
The following example of dispatch.xml shows
how incoming requests will match the regular expression "^/foo/(.*)$ ".
The URL matching the parenthesized expression "(.*) "
will be included as the $1 backreference in the href attribute.
So, if the URL /foo/bar was requested, the URL Requestor
would open a socket to http://foobar.com/bar/foo.xml to obtain
the document to be translated. If the stream does not include an Expires:
header, then the document will be cached on the proxy for 2 minutes.
<context match="^/foo/(.*)$"
matchtype="regexp"
requestor="org.morphis.HttpRequestor"
href="http://foobar.com/$1/foo.xml">
<parameters>
<param name="requestor-cache-max-age"
value="120"/>
<param name="language"
value="en"/>
...
</context>
|
As shown in the previous example, If the language
processing parameter is set, the URL Requestor will set the Accept-Language:
header to this value, thus instructing the destination site to provide
content in that language. If this processing parameter is not set, the
Accept-Language: header will not be set.
When using the http or https scheme, only GET requests
are supported. POST may be implemented in future releases.
URL Requestor Processing Parameters
The following table shows processing paramaters used
by the URL Requestor.
URL Requestor Processing Parameters
Note: if these parameters are not set in dispatch.xml, they
must be set in a prechain filter. Requestors are processed right
after the prechain filters are processed.
|
Name |
Default Value |
Parameter Value Description |
language |
(none) |
If set, the Accept-Language request header is set to
this value. |
url-requestor-cache |
true |
If set to true (or, not set at all) pages retrieved with an
Expires: header will be cached. Set to false to prevent page
caching. |
requestor-cache-max-age |
-1 |
Sets the time in seconds this page should be cached
if no other heuristics can be used. Defaults to never (-1). Requestors
smart enough to know when their content expires may set this parameter
directly. For instance, the org.morphis.URLRequestor
looks at the Expires flag to determine when to expire the cache. If
the header does not exists, the value from parameter requestor-cache-max-age
is used instead. |
appname |
morphis |
Sets the application name. This name is used to resolve
application properties, directories, and logging levels. Fore more
detail see the General Configuration section. |
requestor-user-agent |
morphis0.9.2.beta |
Sets the User-Agent HTTP header field. This is the User-Agent
header that the requestor uses when making HTTP requests.
A value of "default " will use the
initial request's User-Agent header as the value of this
request.
|
requestor-accepts |
*/* |
Sets the Accepts HTTP header field. This is the Accepts
header that the requestor uses when making HTTP requests.
A value of "default " will use the
initial request's Accepts header as the value of this request.
|
The URL requestor is able to obtain additional translation
processing instructions and parameters from the incoming HTTP headers,
and in the body of the document itself. See the section on Run-Time
Processing Instructions below.
Examples of Local File Requesting
Documents residing on the local filesystem may be translated
without incurring the overhead of a network call. At first, this may seem
to be only useful for static documents. However, a static document can
be turned into a dynamic document by using a variety of templating tools
inside the translation process. For example, XSLT provides scripting languages
that can be written directly in the stylesheet, or via separate extension
classes. You may also write your own translation filters to perform business
logic, database calls, and any other dynamic processing.
The URL Requestor takes the URL specified in the <context
requestor="org.morphis.FileRequestor" href="file-path"...>
and loads it from disk.
The following example illustrates how requests which
match the regular expression "^/foo/(.*)$ " will
be processed via the file foo.xml .
<context match="^/foo/(.*)$"
matchtype="regexp"
requestor="org.morphis.URLRequestor"
href="file:///docs/foo.xml">
...
</context>
|
If the file is an XML document, it may include additional
processing instructions as shown in Run-Time Processing
Instructions below.
|