Archives

Categories

Ad Lagendijk Ad Lagendijk 26 November 2010

Improving plotting programs with better data import

Tags: , , , ,
Posted in Presentations quality, useful software

When you look at modern scientific journals you will find that in the majority of papers (some of) the results are presented in graphical form, from a simple black-and-white X-Y plot to sophisticated multicolor 3D-plots.

The data that are graphed come from various origins, like filled-out surveys, output of detectors, or mathematical programs. If the data are gathererd or produced by a commercial computer program, the developers of that program will try to supply graphing capabilities within the program. Keeping users tied to a program is of high commercial value. So Microsoft’s programs Word and Excel supply plotting facilities, although the capability is so rudimentary that it does not deserve the name.

Plots produced by Mathematica
More sophisticated graphing facilities are supplied by the mathematical programs like Mathematica, Maple and MathCad. Not surprising a major part of the developers of these mathematical programs seem to have a degree in mathematics. And the results are impressive: Mathematica is of staggering high mathematical quality. The disadvantage of being developed by scientists that do not care much about communication is that the presentation quality of the plots from the mathematical programs is from poor to horrible. With a few exceptions, like Ian Stewart, mathematicians are notoriously bad in communicating their results to non-mathematicians. Natural scientists like chemists are better in this respect. The result is that the plots produced are of terrible quality. The plots generated by Mathematica could be used by eye doctors to check the eyes of their patients.

Always high quality
When it comes to publishing your results in a journal you often have to rely on more specialized plotting programs like Origin or Sigmaplot. In my environment the choice is Origin. I think the transition from poor plots to high-quality plots with large font sizes, thick lines, properly labeled axes, full legends etc., should not be postponed to the time of publishing. Also in the communication between scientists and group members high quality of graphs improves the level of scientific discussions and results.

So why not use Origin for all the daily plots? There are a number of problems. The default settings of fonts, line thicknesses, colors etc., are always wrong. Furthermore changing the standard options is awkward. Delivering high-quality plots is time consuming. As a result I am often confronted in my group with poor-quality graphs, produced by Origin. All because the developers of this program have not communicated enough with scientists. Out of the many feature requests I could come up with I will present here two feature requests that could easily be implemented and would make the use of plotting programs much easier.

Feature request 1: comment lines in data files
You will have your data in a database or in some file, like a simple ascii file or comma-separated list or a sql file. While plotting your data you often end up in a cycle of operations: (i) you save a file with your new data, (ii) import this file into your plotting program, (iii) generate your plot and on the basis of this plot you regenerate a slightly modified data file, and so on. In my case it happens that I have typically tens and tens of those cycles until I am satisfied with the results.

I will discuss the simple case of a X-Y plot. You will need a set of data-pairs. If you save such a set of pairs with the comma and linefeed as the separators, any importing wizard in the plotting program will be able to read and interpret this file. If you supply in the first line (record) the labels for X and Y the importing facility will even recognize these headings. So what is the problem? After a couple of years you will be left with hundreds of these data files and you have no idea what they represent. Because in addition to the X,Y variables the plot will be characterized by a set of additional parameters, like temperature, or frequency, of pH, or whatever. You would like to have these data in the file as well. The advantage is that you can use these additional paremeters for the legend. And later, when you reread the file, all these additional parameters will help you to understand what is actually in the file. So how do you put these addtional data in your data file? You could put them in front of the list of pairs. The problem now is that importer of the plotting program does not recognize the file automatically. You will have to tell the program continuously – that is at each cycle –  to skip a number of lines. This slows down plotting cycles considerably and is very irritatable.

The solution can be found in any computer language: a comment line. In Unix lines that start with ” #”,  in Java with “;”  and in C++  with “”//” are interpreted as comment lines that have to be skipped. Whatever the plotting software developers introduce as comment indicator – I could not care less – but introducing it it would help us a lot. The result will be the data file will be automatically recognized by the importer of the plotting program, whatever information is in the comment lines, and the file can be retrieved and interpreted by the user, even after years and years.

Feature request 2: Automatic reload
All modern operating systems use file mappings. That means that when a file is opened and read its content is hold in (physical or virtual) memory. If the developers are smart they do not lock the physical file on disk. The advantage of this not-locked file is that the file on disk can be changed by another program. To prevent mishaps the program developer should check when writing the file back to disk whether or not the file on disk was changed by another program. If the file on disk was changed it should notify the user what to do: reload the new file into memory or overwrite the changed file on disk with the map in memory. Any good program allows the setting of this “write or reload” to be done automatically by setting an option of the program. A computer program developer will typically use several different editor programs at the same time and she would like these programs to work with the newest version of the file automatically. Programs as CodeWright and SlickEdit supply this very handy feature.

I would be very happy if the plotting programs would allow for an option of automatic reload if it discovers that the data file that is the source of the data to be plotted, has changed. In this way we have shortened the plotting cycle considerably: (i) you run the program that will output new data and (ii) you look at the new plot, without any extra mouse clicks or keyboard input, becuase the import wizeard has automatically reloaded the new data file and has automatically updated the plots.

- - - - - -
If you like this post why don't you email subscribe to our new posts. Or subscribe to our RSS feed.
  1. Unregistered

    26 Nov 2010 13:01, Kaan Ozturk

    Gnuplot supports both features you request. Data lines that begin with the character # are ignored, and the “replot” command will update the plot with the latest data on file. I don’t know if it can be configured to do the updates automatically as soon as the file is changed.

    Gnuplot is highly configurable and produces high-quality plots, but its command-line interface is discouraging. My feature request would be a Matlab-like GUI for Gnuplot.

  2. Mirjam

    27 Nov 2010 10:59, Mirjam

    I like gnuplot exactly for its command-line interface. You make one text file with all the plotting commands for a particular kind of graph and all you need to do to plot another dataset with the same graph settings is change the name of the datafile it refers to.

  3. Unregistered

    28 Nov 2010 21:07, Cole Van Vlack

    As mentioned by Kaan, Matlab is pretty good at implementing both of your suggestions. In C/C++ and Fortran it is possible to write directly to a Matlab data file instead of an ascii file. This allows all of the other input variables (temperature, simulation time, etc…) to be placed in the file along with the relevant data.

    Matlab scripts (that include commenting…) then allow post processing including setting font and line-width properties. The figure GUI even has an “export current settings to .m file” which allows you to point and click the settings once and then write them as part of your post processing. Even with these capabilities, the process is still onerous unfortunately…

  4. Unregistered

    30 Nov 2010 22:47, Alexander

    I am the developer of MagicPlot plotting and nonlinear fitting software and this is very interesting post for me. I want to make some comments about feature requests and users.
    MagicPlot is quite small application if compared with Origin but it is used by different people (who I don’t know) in different ways, and I know only a part of this ways. When discuss the program with my friends each of them ask me a different thing. In fact if we try to satisfy all users the application will be not usable. But on the other hand the programmers can make many little usability enhancements in many little parts of application and the application will be better at all. But ‘more functions with the same complexity’ idiom is not so easy to implement.
    And about user activity. I get feature requests or opinions once in a blue moon, and it is a kind of problem of many developers, I think. So the posts like this are very important.
    P.s. MagicPlot supports feature #1 and feature #2 is planned in future releases, welcome to try it out!

  5. Unregistered

    5 Dec 2010 0:53, Martin Heimann

    I’m astonished about this comment. Years ago, I also thought that one needs specialized graphing software to make publication quality (and also good looking) graphics. Igor (now called “Igor Pro”) was my favorite then: it allows to fine tune the graphic on the screen and then save the layout as a script, which can then be used for similar plots either as a template or also for mass production. It also allows automatic reload since it can be used to monitor and graph the output of a measuring device etc. with continuous updates. However, I believe that the modern interpreter languages Mathematica Matlab and others have now so good graphic capabilities built in, that the specialized software is needed only in rare cases. Personally I have switched now completely to Mathematica. Both of your feature requests are trivial to meet with a just a few lines of code. And the flexibility of making graphs is, to my knowledge, unsurpassed; you can change all styles and aspect with just a few keywords. The program can now import and export practically all data and graphic formats available. And, most usefully, the notebook structure preserves a written account of what you did, allowing detailed commenting, which is useful when you go back to a project a few years later. Of course, as with all sophisticated software systems, there’s a somewhat steep learning curve to it, and the program is darn expensive. But I don’t regret the time and money spent.

    (ps. I have no shares of Wolfram Research Inc….)

  6. Unregistered

    5 Dec 2010 16:33, Alexander

    Martin Heimann,
    I think the reason is using of scripts and some programming. I know many people who doesn’t like scripting. That is why gnuplot is not the favorite plotting tool for everyone. I suppose that the learning of scripts itself is not the sole problem.

  7. Ad Lagendijk

    6 Dec 2010 20:27, Ad Lagendijk

    @Martin
    I will be pretty direct: if you use Mathematica plots straight from the program your plots will be of poor quality. I see it happen so often: thin lines, small fonts, wrong colors.

  8. Unregistered

    6 Dec 2010 21:43, Alexander

    Ad Lagendijk,
    Are you talking about the default plot style in Mathematica?

  9. Unregistered

    7 Dec 2010 13:46, Martin Heimann

    @Ad,

    true, the Plot or ListPlot default styles are not good. But just add PlotStyle->Thick, BaseStyle->18 and you get perfect graphs with very legible labels. And with very little additional effort you have control over every aspect of a graph. I think the problem of bad graphics in publications is not the software, but the poor skills of the originator. In the old days, we had graphically skilled personnel to draw the figures of publications. Nowadays, to save money, we do not employ such employees anymore and the scientists have to make the graphs themselves. I think it is indispensable that master and PhD students take a professional course on how to make aesthetically nice graphics.