| The XRay HTTP Monitoring Client (xray-0.5)
| |
Availability
XRay has been made available under the terms of the
GNU Public License (GPL)
Download |
Installation
XRay is trivial to install:
- Extract the jar file to an empty directory where XRay will live.
- Run XRay under java 1.3+ using the current main class name HttpMonitor.
- Configure your web browser to use HttpMonitor as a web proxy as explained below.
Description
This description covers only that functionality that has already been
implemented and extensively tested over a period of two years. It does not
address the entire founding purpose of XRay, which is dealt with in future
sections of this document.
XRay is a 100% pure Java, GUI enhanced,
tool for monitoring and logging of all HTTP messages that transpire between any
number of arbitrary web clients and any number of arbitrary web servers. A
treeview control maintains the association between the HTTP request and the
server's response to that request, while a text window simultaneously records
all headers and fields of all messages, preserving the actual order in which
messages were sent. The entire content of the text window may be saved as an
ordinary text file at any time, whereas individual messages or sequences thereof
may be copied and pasted to the system clipboard for use by external
applications. Thirdly, XRay embeds an instance of JEditorPane, which functions
as an internal web browser. Internal requests (requests originating from the
embedded client, which has been implemented by Sun as part of JEditorPane) may,
at the discretion of the user, be captured and proxied, in a manner identical to
that in which external requests (requests originating from a web browser like
Mozilla or Internet Explorer) are. In addition, there is an option to write all
informational and error messages to the console.
How to
use XRAY
- Configure any number of web browsers (practically
speaking, this number is probably 1) to use XRay as a proxy through some port
on the machine on which XRay will run, say 3002. This is typically done via the
main menu of your web browser. For example, if you are using Mozilla the
sequence of mouse clicks is: Edit->Preferences->Advanced->Proxies. Then enable
the "Manual Configuration" radio button and type "3002" (or
whatever port you desire) into the appropriate text boxes, which are clear from
the arrangement of the dialog. The process is essentially the same for all major
web browsers.
- Start XRay by typing (the old class name is still in
effect):
java HttpMonitor
- Choose Options->Set Port
through XRay's main menu, and type in the port number you used in step
1.
- Press the Start tool button to start the monitor.
- If
necessary, generate some data by using the attached client(s) to browse the
web.
- You may stop the monitor at any time by pressing the Stop tool
button.Connected clients will no longer have access to the internet. To restart
themonitor, press Start again.
- Experiment with the menu to
familiarize yourself with XRays features, most,if not all, of which are covered
in this document.
- When you are done monitoring, exit XRay by
pressing the Exit tool button(the standard exit box in the top right hand corner
has been disabled).
Currently Implemented
Features
- Full capture of all HTTP message content, including the
headers and the body.
- Text-based logging of all monitored HTTP
conversations.
- Extraction and recording (via the tree view control) of all host IP
addresses.
- Limited web browsing capabilities through the embedded instance of
JEditorPane.
- There is an option to capture the HTTP conversation between the embedded
browser and the remote server, subject to the limitations mentioned in this
document.
- Transparency: from the point of view of the client XRay
acts as a mere relay. It is invisible to the client application with the
exception of the required configuration steps enumerated above. Because XRay
simply forwards all message headers to the server it is (or at least should be)
also invisible to the server.
- Because XRay is 100% java, it is
platform independent. XRay has been tested extensively under both Windows 2000
and Linux.
- XRay is GUI enabled so you won't have to fuss with the
command line tocapture messages.
- The tree view control preserves the
association between a request and its response, while grouping all messages
associated with the same remote server together.
- A pop-up menu
associated with the tree view control supports instant access to the headers and
message body of a displayed message.
- XRay need not run on the same
machine as the web browser(s) it tracks.
- The GUI currently supports the following options and capabilities (this
list may notbe complete):
- Setting of the default user data directory.
- Saving and restoring the content of the text window.
- Full cut/copy/paste functionality for applicable windows and
dialogs.
- Resetting the port at which XRay listens for client
requests. You may not reset the port while the monitor is
running.
- Optional suppression of server redirects and 'Not
Modified' messages.
- Easy access through the tree view component
to all message headers and their associated text or raw binary content.
Simply right-click the desired message on the message tree.
- Starting and stopping the monitor.
- The option to capture (proxy) all messages associated with "internal"
requests generated through the embedded web browser
(JEditorPane).
- Purging and clearing of message tree (treeview)
content and data. Purging the tree removes cached connections from memory
and writes them to disk. Clearing the tree removes all visible content and
subsequently purges the tree.
- Clearing the text window
(log).
- Navigating to an arbitrary http URL.
- Exiting XRay.
Limitations
- The user should be warned however that XRay is currently in the alpha
stage. In particular, the internal web browser is unresponsive at times and does
not fully support graphics, forms, or attributed text. Despite these defects
(and others not mentioned here), XRay is currently usable as a web monitor and
is particularly useful as an aid in the development of web-scraping
applications.
- As I have intimated, the internal web browser, while
not essential to the usefulness of XRay, is probably too buggy to contribute
much to the utility of the application in its present form. If you are
downloading XRay for use as a web browser,you will be disappointed. A lot of
work remains to be done here (details below).
- Data collection will
slow down your web browser considerably. You will not want to use it all the
time, at least until the threading model has been improved in a future release.
- The perceived speed may vary depending on how your browser is
configured, and on which browser you use. For example Mozilla appears to cause
XRay to run more slowly than does Galeon.
- I have experienced
problems getting Konqueror to recognize XRay, even when it is configured
properly to use XRay as a proxy.
- The user interface occasionally
fails to respond due to threading problems, which appear to have worsened as of
the addition of the self-proxying capability. I am currently investigating this
matter.
- XRay is not a fully HTTP 1.1 compliant proxy, but I have not
experienced difficulties capturing the full content of any message, at least to
my knowledge.
- On the bright side, XRAY almost never crashes, but it
does sometimes write stack traces to the console. Many of these errors are
caused by misbehavior on the part of the server, but some traces do appear
indicative of coding errors and require future investigation.
- Very
occasionally, a high security server, such as a banking application, appears to
see XRay and refuses to respond. This is a mystery to me. On the other hand, I
have used XRay under Internet Explorer to monitor my own online banking
transactions.
- If client-side caching of images is in effect, images
will not be displayed on the internal browser window. An option to retrieve
these images should be implemented. Any sort of cached content will exhibit this
behavior.
- Purged connections are not removed from the disk when the
application exits. This lacuna can clutter your data directory with useless
connection files. I should really fix this.
Future
Plans
The original concept out of which XRay has devolved was that of a
fully functional web browser with interactive web-scraping and robot generation
capabilities. Users should be able to generate flexible content retrieval robots
on the basis of live data issuing from all local clients and remote servers with
respect to which it acts as a kind of web proxy while the user simultaneously
browses the web. XRay is thereby "self-proxing" in the sense that HTTP
converations initiated from the GUI are processed as fully as those initiated
from an external client. I regard this novel arrangement as the ideal web
scraping environment. After all, manually browsing the web is simply a lowest
common denominator, primitive form of web scraping. I believe that recent
literature that casts this functionality in a pejorative light is benighted: any
useful activity, of which extracting content from the web is a central example,
that can be automated should be automated to the maximum extent possible. The
main purpose of XRay has been, and remains to be, to fulfill this need. To the
best of my knowledge, the XRay concept is unique in its intention to integrate
both internal and external client activity as two essential parts of a seemless
content extraction utility, but these requirements raise a number of difficult
issues.
- Sun's answer to Microsoft's web browser control is somewhat
pathetic and does not in fact do justice to the notion of a reasonablyadequate
web browser. It is very likely that we will need to replace the current
implementation of the internal client.
- Conceptual issues having to
do with how to coordinate capture and manipulation of internal and external HTTP
messages need to be articulated and addressed at multiple levels of
analysis.
- The existing code base is practically (but not completely
or absolutely) undocumented.
- The existing code base is not well
designed and will require extensive refactoring before any new functionality can
be added. In fact, a total rewrite may be in order.
- Ideally, future
work on XRay will embody current best practices in software development, drawing
upon principles of Extreme Programming, Agile Methods, Aspect Oriented
Programming, and design patterns.
Request For
Help
For the reasons given, this project will not survive a battery of
piecemeal, ad-hoc, uncoordinated changes. Therefore, I am initially looking for
a small number of experienced developers to double as collaborators and mentors.
High-to-mid level design skills are as important as Java coding skills, and past
experience writing multithreaded networking applications and/or multithreaded
GUI's (preferably both) in java is pretty much mandatory. Together we will solve
the five problems enumerated above.
Sourceforge Project Summary
Page
Other Projects
Author: Ben Tompkins
brtompkins@comcast.net