XRay Logo

The XRay HTTP Monitoring Client (xray-0.5)

A screenshot of XRay in 
action.

Availability

XRay has been made available under the terms of the GNU Public License (GPL)

Download


Installation

XRay is trivial to install:

  1. Extract the jar file to an empty directory where XRay will live.
  2. Run XRay under java 1.3+ using the current main class name HttpMonitor.
  3. Configure your web browser to use HttpMonitor as a web proxy as explained below.

Description

This description covers only that functionality that has already been implemented and extensively tested over a period of two years. It does not address the entire founding purpose of XRay, which is dealt with in future sections of this document.

XRay is a 100% pure Java, GUI enhanced, tool for monitoring and logging of all HTTP messages that transpire between any number of arbitrary web clients and any number of arbitrary web servers. A treeview control maintains the association between the HTTP request and the server's response to that request, while a text window simultaneously records all headers and fields of all messages, preserving the actual order in which messages were sent. The entire content of the text window may be saved as an ordinary text file at any time, whereas individual messages or sequences thereof may be copied and pasted to the system clipboard for use by external applications. Thirdly, XRay embeds an instance of JEditorPane, which functions as an internal web browser. Internal requests (requests originating from the embedded client, which has been implemented by Sun as part of JEditorPane) may, at the discretion of the user, be captured and proxied, in a manner identical to that in which external requests (requests originating from a web browser like Mozilla or Internet Explorer) are. In addition, there is an option to write all informational and error messages to the console.


How to use XRAY

  1. Configure any number of web browsers (practically speaking, this number is probably 1) to use XRay as a proxy through some port on the machine on which XRay will run, say 3002. This is typically done via the main menu of your web browser. For example, if you are using Mozilla the sequence of mouse clicks is: Edit->Preferences->Advanced->Proxies. Then enable the "Manual Configuration" radio button and type "3002" (or whatever port you desire) into the appropriate text boxes, which are clear from the arrangement of the dialog. The process is essentially the same for all major web browsers.

  2. Start XRay by typing (the old class name is still in effect):

    java HttpMonitor

  3. Choose Options->Set Port through XRay's main menu, and type in the port number you used in step 1.

  4. Press the Start tool button to start the monitor.

  5. If necessary, generate some data by using the attached client(s) to browse the web.

  6. You may stop the monitor at any time by pressing the Stop tool button.Connected clients will no longer have access to the internet. To restart themonitor, press Start again.

  7. Experiment with the menu to familiarize yourself with XRays features, most,if not all, of which are covered in this document.

  8. When you are done monitoring, exit XRay by pressing the Exit tool button(the standard exit box in the top right hand corner has been disabled).

Currently Implemented Features

  1. Full capture of all HTTP message content, including the headers and the body.

  2. Text-based logging of all monitored HTTP conversations.

  3. Extraction and recording (via the tree view control) of all host IP addresses.

  4. Limited web browsing capabilities through the embedded instance of JEditorPane.

  5. There is an option to capture the HTTP conversation between the embedded browser and the remote server, subject to the limitations mentioned in this document.

  6. Transparency: from the point of view of the client XRay acts as a mere relay. It is invisible to the client application with the exception of the required configuration steps enumerated above. Because XRay simply forwards all message headers to the server it is (or at least should be) also invisible to the server.

  7. Because XRay is 100% java, it is platform independent. XRay has been tested extensively under both Windows 2000 and Linux.

  8. XRay is GUI enabled so you won't have to fuss with the command line tocapture messages.

  9. The tree view control preserves the association between a request and its response, while grouping all messages associated with the same remote server together.

  10. A pop-up menu associated with the tree view control supports instant access to the headers and message body of a displayed message.

  11. XRay need not run on the same machine as the web browser(s) it tracks.

  12. The GUI currently supports the following options and capabilities (this list may notbe complete):
    1. Setting of the default user data directory.

    2. Saving and restoring the content of the text window.

    3. Full cut/copy/paste functionality for applicable windows and dialogs.

    4. Resetting the port at which XRay listens for client requests. You may not reset the port while the monitor is running.

    5. Optional suppression of server redirects and 'Not Modified' messages.

    6. Easy access through the tree view component to all message headers and their associated text or raw binary content. Simply right-click the desired message on the message tree.

    7. Starting and stopping the monitor.

    8. The option to capture (proxy) all messages associated with "internal" requests generated through the embedded web browser (JEditorPane).

    9. Purging and clearing of message tree (treeview) content and data. Purging the tree removes cached connections from memory and writes them to disk. Clearing the tree removes all visible content and subsequently purges the tree.

    10. Clearing the text window (log).

    11. Navigating to an arbitrary http URL.

    12. Exiting XRay.

Limitations

  1. The user should be warned however that XRay is currently in the alpha stage. In particular, the internal web browser is unresponsive at times and does not fully support graphics, forms, or attributed text. Despite these defects (and others not mentioned here), XRay is currently usable as a web monitor and is particularly useful as an aid in the development of web-scraping applications.

  2. As I have intimated, the internal web browser, while not essential to the usefulness of XRay, is probably too buggy to contribute much to the utility of the application in its present form. If you are downloading XRay for use as a web browser,you will be disappointed. A lot of work remains to be done here (details below).

  3. Data collection will slow down your web browser considerably. You will not want to use it all the time, at least until the threading model has been improved in a future release.

  4. The perceived speed may vary depending on how your browser is configured, and on which browser you use. For example Mozilla appears to cause XRay to run more slowly than does Galeon.

  5. I have experienced problems getting Konqueror to recognize XRay, even when it is configured properly to use XRay as a proxy.

  6. The user interface occasionally fails to respond due to threading problems, which appear to have worsened as of the addition of the self-proxying capability. I am currently investigating this matter.

  7. XRay is not a fully HTTP 1.1 compliant proxy, but I have not experienced difficulties capturing the full content of any message, at least to my knowledge.

  8. On the bright side, XRAY almost never crashes, but it does sometimes write stack traces to the console. Many of these errors are caused by misbehavior on the part of the server, but some traces do appear indicative of coding errors and require future investigation.

  9. Very occasionally, a high security server, such as a banking application, appears to see XRay and refuses to respond. This is a mystery to me. On the other hand, I have used XRay under Internet Explorer to monitor my own online banking transactions.

  10. If client-side caching of images is in effect, images will not be displayed on the internal browser window. An option to retrieve these images should be implemented. Any sort of cached content will exhibit this behavior.

  11. Purged connections are not removed from the disk when the application exits. This lacuna can clutter your data directory with useless connection files. I should really fix this.

Future Plans

The original concept out of which XRay has devolved was that of a fully functional web browser with interactive web-scraping and robot generation capabilities. Users should be able to generate flexible content retrieval robots on the basis of live data issuing from all local clients and remote servers with respect to which it acts as a kind of web proxy while the user simultaneously browses the web. XRay is thereby "self-proxing" in the sense that HTTP converations initiated from the GUI are processed as fully as those initiated from an external client. I regard this novel arrangement as the ideal web scraping environment. After all, manually browsing the web is simply a lowest common denominator, primitive form of web scraping. I believe that recent literature that casts this functionality in a pejorative light is benighted: any useful activity, of which extracting content from the web is a central example, that can be automated should be automated to the maximum extent possible. The main purpose of XRay has been, and remains to be, to fulfill this need. To the best of my knowledge, the XRay concept is unique in its intention to integrate both internal and external client activity as two essential parts of a seemless content extraction utility, but these requirements raise a number of difficult issues.

  1. Sun's answer to Microsoft's web browser control is somewhat pathetic and does not in fact do justice to the notion of a reasonablyadequate web browser. It is very likely that we will need to replace the current implementation of the internal client.

  2. Conceptual issues having to do with how to coordinate capture and manipulation of internal and external HTTP messages need to be articulated and addressed at multiple levels of analysis.

  3. The existing code base is practically (but not completely or absolutely) undocumented.

  4. The existing code base is not well designed and will require extensive refactoring before any new functionality can be added. In fact, a total rewrite may be in order.

  5. Ideally, future work on XRay will embody current best practices in software development, drawing upon principles of Extreme Programming, Agile Methods, Aspect Oriented Programming, and design patterns.

Request For Help

For the reasons given, this project will not survive a battery of piecemeal, ad-hoc, uncoordinated changes. Therefore, I am initially looking for a small number of experienced developers to double as collaborators and mentors. High-to-mid level design skills are as important as Java coding skills, and past experience writing multithreaded networking applications and/or multithreaded GUI's (preferably both) in java is pretty much mandatory. Together we will solve the five problems enumerated above.


Sourceforge Project Summary Page

Other Projects


Author: Ben Tompkins
brtompkins@comcast.net

SourceForge.net Logo