morphis logo morphis | wax   
morphis manual
manual
 

Processing
   Dispatching
   Requesting
   Translating

Installation
   Requirements
   Instructions
   Config Properties

Dispatching
   Configuration
   Pre/Post Filters

Requestors
   URL Requesting
   Servlet Chaining
   Run-time Instructs
   Custom Requestors

Translators
   Non-Translator
   Custom Translators

SAX Translators
   XSLT Translation
   Custom SAX Filters

Requestors: URL Requestor

Morphis includes the URL Requestor which will, given a URL to a document, fetch the contents of the document and send it to a Translator. The supported URL schemes include http, https, ftp and file.

The URL Requestor is useful for translating documents residing on other servers (via http:// or ftp://), and even on the local filesystem (file://). This is very useful when obtaining translation source streams from other web content management systems.

Please Note that Java Secure Socket Extension must be installed for the https scheme to work. It may be downloaded from the Java website.

When using the HTTP scheme, the following additional features are provided:

  • Session cookies: cookies set by the destination server will be temporarily stored by Morphis, and sent back to the destination server during appropriate requests. This is implemented by keeping session state with the client. The HTTP proxy requestor does not currently support persistent cookies set by the destination server.

  • Caching: If the stream from the destination server includes an Expires: header set in the future, the HTTP proxy will cache the stream until the expires date is reached. If the Expires: header is not set, you can force the page to be cached by setting the requestor-cache-max-age processing parameter. You may disable caching by setting the url-requestor-cache processing parameter to false. By default, it is set to true. The HTTP proxy requestor will also refuse to cache documents with the Cache-Control: header including no-cache.

  • Redirection: An HTTP stream returning a 3xx status code will automatically be re-retrieved using the Location: header.

The URL Requestor may determine the destination site in two separate ways. First, the dispatch.xml file may define an href attribute in the context element. The following example shows how requests for /foo/bar will be proxied to http://remote.server.com/the/file.

<context match="/foo/bar"
    matchtype="exact"
    requestor="org.morphis.URLRequestor"

    href="http://remote.server.com/the/file">

The other way is to not include the href attribute in the context element, but supply it in the requesting URL itself. For example, the following request to:

http://my.morphis.server/foo/bar?url=http://remote.server.com/the/file
(note: To be correct, the parameter value for url should be URL encoded)

would match the following context in dispatch.xml, and would proxy the data from http://remote.server.com/the/file.

<context match="/foo/bar"
    matchtype="exact"
    requestor="org.morphis.URLRequestor"
>

The advantage of this method is that links in the returned document could be parsed, and prepended with http://my.morphis.server/foo/bar?url=. This would ensure that all links on a page go through Morphis. Morphis WAX automatically provides this feature when using the URL Requestor in this manner.

The following example of dispatch.xml shows how incoming requests will match the regular expression "^/foo/(.*)$". The URL matching the parenthesized expression "(.*)" will be included as the $1 backreference in the href attribute. So, if the URL /foo/bar was requested, the URL Requestor would open a socket to http://foobar.com/bar/foo.xml to obtain the document to be translated. If the stream does not include an Expires: header, then the document will be cached on the proxy for 2 minutes.

<context match="^/foo/(.*)$"
    matchtype="regexp"
   
requestor="org.morphis.HttpRequestor"
   
href="http://foobar.com/$1/foo.xml">

   <parameters>
     <param name="requestor-cache-max-age" value="120"/>
     <param name="language" value="en"/>
     ...

</context>

As shown in the previous example, If the language processing parameter is set, the URL Requestor will set the Accept-Language: header to this value, thus instructing the destination site to provide content in that language. If this processing parameter is not set, the Accept-Language: header will not be set.

When using the http or https scheme, only GET requests are supported. POST may be implemented in future releases.

URL Requestor Processing Parameters

The following table shows processing paramaters used by the URL Requestor.

URL Requestor Processing Parameters
Note: if these parameters are not set in dispatch.xml, they must be set in a prechain filter. Requestors are processed right after the prechain filters are processed.

Name Default Value Parameter Value Description
language (none) If set, the Accept-Language request header is set to this value.
url-requestor-cache true If set to true (or, not set at all) pages retrieved with an Expires: header will be cached. Set to false to prevent page caching.
requestor-cache-max-age -1 Sets the time in seconds this page should be cached if no other heuristics can be used. Defaults to never (-1). Requestors smart enough to know when their content expires may set this parameter directly. For instance, the org.morphis.URLRequestor looks at the Expires flag to determine when to expire the cache. If the header does not exists, the value from parameter requestor-cache-max-age is used instead.
appname morphis Sets the application name. This name is used to resolve application properties, directories, and logging levels. Fore more detail see the General Configuration section.
requestor-user-agent morphis0.9.2.beta Sets the User-Agent HTTP header field. This is the User-Agent header that the requestor uses when making HTTP requests. A value of "default" will use the initial request's User-Agent header as the value of this request.
requestor-accepts */* Sets the Accepts HTTP header field. This is the Accepts header that the requestor uses when making HTTP requests. A value of "default" will use the initial request's Accepts header as the value of this request.

The URL requestor is able to obtain additional translation processing instructions and parameters from the incoming HTTP headers, and in the body of the document itself. See the section on Run-Time Processing Instructions below.

Examples of Local File Requesting

Documents residing on the local filesystem may be translated without incurring the overhead of a network call. At first, this may seem to be only useful for static documents. However, a static document can be turned into a dynamic document by using a variety of templating tools inside the translation process. For example, XSLT provides scripting languages that can be written directly in the stylesheet, or via separate extension classes. You may also write your own translation filters to perform business logic, database calls, and any other dynamic processing.

The URL Requestor takes the URL specified in the <context requestor="org.morphis.FileRequestor" href="file-path"...> and loads it from disk.

The following example illustrates how requests which match the regular expression "^/foo/(.*)$" will be processed via the file foo.xml.

<context match="^/foo/(.*)$"
    matchtype="regexp"
   
requestor="org.morphis.URLRequestor"
   
href="file:///docs/foo.xml">
    ...
</context>

If the file is an XML document, it may include additional processing instructions as shown in Run-Time Processing Instructions below.

morphis SourceForge Logo