Analysis: The law of diminishing returns

The law of diminishing returns is a well known concept in economics. Highly simplified, it states that as you invest more, the overall return on investment increases at a declining rate. I wondered if this principle applies to biomedical research.

I thus wrote a small script to parse the Medline database and count for each year 1) the number of new papers published, 2) the number of authors that published at least one paper, and 3) the total number of (co-)authorships. The plot below shows the number of new papers and the number of active authors for each year since 1970:

Exponential growth in the number of papers and authors

Few scientists – if any – will be surprised to see that the rate of publication and the number of active publishing scientists have increased exponentially. However, it is slightly disconcerting that the number active authors doubles every 17 years whereas the number of papers per year doubles only every 22 years.

To look deeper into this, I plotted as function of time the average number of coauthors per paper and the average number of papers coauthored by each active author:

Exponential increase in the number of authorships per paper and per author

These two measures also appear to increase exponentially. However, the number of coauthors per paper is increasing considerably faster than the number of papers coauthored by each author per year. The estimated doubling times are 33 years for the number of coauthors per paper and 63 years for the number of papers coauthored. This suggests that the productivity of biomedical scientists, measured in terms of publications, has decreased.

A more direct way to show this is to plot the ratio between the number of papers published each year and the number of authors on them (note that the y-axis does not start at zero):

The productivity in terms of papers is decreasing

The fact is that the number of papers produced per researcher per year has dropped by roughly one third since 1970. However, there could be many reasons for this:

Have we simply become lazy?
Has the bar been raised for what is considered the Least Publishable Unit?
Are large collaborations less efficient than smaller projects?
Do we spend more time on bureaucracy and less time on science?
Or are we left with the hard questions because the easy ones have all been answered?

My guess is that the last three reasons all play important roles. What do you think?

WebCite Cite this post

5 thoughts on “Analysis: The law of diminishing returns”

rodinsky February 12, 2008 at 13:34

Hi Lars,

Nice blog! And interesting finding that there are less papers per author.

Can I come up with one more hypothesis for why the number of authors increases more than papers per author?:

People now work in big groups in which there are a few creative minds and a lot of manual labour and temporary work (like doing a PhD before leaving to industry, etc.). Thereby the productivity of scientific workers might have become more polarized. As well, now everyone makes it to the author list, in a more democratic fashion. If this scenario holds, one would expect the shape of the distribution of authorships also changing, to increased variability (a lot more people with few papers). So you could check this hypothesis…
I dont think that we are left with the hard questions and that the easy ones have been answered. 30 years is a drop of water in the ocean of the scientific unknown…
Best!

Reply ↓

Lars Juhl Jensen Post authorFebruary 12, 2008 at 13:44

Very good point that everyone makes it onto papers in a more democratic fashion now than earlier (which is obviously a good thing). Especially, I think that lab technicians were generally always left out in the old days, whereas it has now become more common to include them on the author list of experimental papers. Actually, a colleague suggested this to me when I showed him the plots, but I forgot about it when I wrote the blog post.

Reply ↓

Lars Juhl Jensen Post authorFebruary 12, 2008 at 17:15

I have looked at the distribution of authorships, but there is a problem: in 2007 there were 9908 paper published by Wang Y, 8642 by Zhang Y, 8140 by Li Y and 8122 by Wang J etc. The problem is obviously that there are many authors with identical names. And just for the record: all the top-50 names in 2007 are asian.

The first question is how this affects the statistical results I presented in the post. The pooling of authors with the same name will cause me to underestimate the number of active authors per year. This alone will cause me to underestimate the growth in the number of authors as the fraction of authors that become pooled will increase with the number of authors. This effect is further enhanced by demographic changes. Since this does not affect the number of papers published, it thus means that I have underestimated the decrease in productivity (as measured in terms of papers published per active researcher). The general conclusions thus stay unchanged.

Unfortunately, this affects distribution of authorships in a very bad way, which means that measuring the standard deviation of the distribution as suggested by rodinsky will not work. The obvious solution would be to instead use a measure that is robust to outliers (e.g. Zhang Y) such as the median absolute deviation. However, that will also not work. Because more than half of the authors have published exactly 1 paper in a given year, the median becomes 1 and the median average deviation becomes 0.

Any suggestions for how to work around this problem?

Reply ↓

rodinsky February 14, 2008 at 11:07

Thank you Lars, for following on my comment.
Answering your last question, I personally dislike the use of statistics. Not of the science of statistics, but of statistics that resume data into a number. So I would just plot the distributions of the number of papers per author every five years or so. Also, it would be interesting to see the share of papers per author (for each author, the number of papers published over the number of authors in that paper).
However, the point you made about names is serious and makes the whole exercise doubtful. We could assume that there is more overlap of names now than before, as you did above. In what regards the distributions, if we see that the amount of people publishing little has increased, even with the name bias, we should be able to conclude that there is definitely a lot of authors that before were probably not making it to the author’s list, or that there is a higher turnover of scientific staff.
Another interesting point would be to see the impact of the difference in distributions in your results. To what extent do they justify your findings?
Let’s think about this…

Best regards,

Rodrigo

Reply ↓

Pingback: Commentary: We apologize « Buried Treasure

rodinsky February 12, 2008 at 13:34

Hi Lars,

Nice blog! And interesting finding that there are less papers per author.

Can I come up with one more hypothesis for why the number of authors increases more than papers per author?:

People now work in big groups in which there are a few creative minds and a lot of manual labour and temporary work (like doing a PhD before leaving to industry, etc.). Thereby the productivity of scientific workers might have become more polarized. As well, now everyone makes it to the author list, in a more democratic fashion. If this scenario holds, one would expect the shape of the distribution of authorships also changing, to increased variability (a lot more people with few papers). So you could check this hypothesis…
I dont think that we are left with the hard questions and that the easy ones have been answered. 30 years is a drop of water in the ocean of the scientific unknown…
Best!

R

Reply ↓
Lars Juhl Jensen Post authorFebruary 12, 2008 at 13:44

Very good point that everyone makes it onto papers in a more democratic fashion now than earlier (which is obviously a good thing). Especially, I think that lab technicians were generally always left out in the old days, whereas it has now become more common to include them on the author list of experimental papers. Actually, a colleague suggested this to me when I showed him the plots, but I forgot about it when I wrote the blog post.

Reply ↓
Lars Juhl Jensen Post authorFebruary 12, 2008 at 17:15

I have looked at the distribution of authorships, but there is a problem: in 2007 there were 9908 paper published by Wang Y, 8642 by Zhang Y, 8140 by Li Y and 8122 by Wang J etc. The problem is obviously that there are many authors with identical names. And just for the record: all the top-50 names in 2007 are asian.

The first question is how this affects the statistical results I presented in the post. The pooling of authors with the same name will cause me to underestimate the number of active authors per year. This alone will cause me to underestimate the growth in the number of authors as the fraction of authors that become pooled will increase with the number of authors. This effect is further enhanced by demographic changes. Since this does not affect the number of papers published, it thus means that I have underestimated the decrease in productivity (as measured in terms of papers published per active researcher). The general conclusions thus stay unchanged.

Unfortunately, this affects distribution of authorships in a very bad way, which means that measuring the standard deviation of the distribution as suggested by rodinsky will not work. The obvious solution would be to instead use a measure that is robust to outliers (e.g. Zhang Y) such as the median absolute deviation. However, that will also not work. Because more than half of the authors have published exactly 1 paper in a given year, the median becomes 1 and the median average deviation becomes 0.

Any suggestions for how to work around this problem?

Reply ↓
rodinsky February 14, 2008 at 11:07

Thank you Lars, for following on my comment.
Answering your last question, I personally dislike the use of statistics. Not of the science of statistics, but of statistics that resume data into a number. So I would just plot the distributions of the number of papers per author every five years or so. Also, it would be interesting to see the share of papers per author (for each author, the number of papers published over the number of authors in that paper).
However, the point you made about names is serious and makes the whole exercise doubtful. We could assume that there is more overlap of names now than before, as you did above. In what regards the distributions, if we see that the amount of people publishing little has increased, even with the name bias, we should be able to conclude that there is definitely a lot of authors that before were probably not making it to the author’s list, or that there is a higher turnover of scientific staff.
Another interesting point would be to see the impact of the difference in distributions in your results. To what extent do they justify your findings?
Let’s think about this…

Best regards,

Rodrigo

Reply ↓
Pingback: Commentary: We apologize « Buried Treasure

Buried Treasure

A computational biologist cleans up his disk

Analysis: The law of diminishing returns

5 thoughts on “Analysis: The law of diminishing returns”

Leave a comment Cancel reply

Share this:

Related

5 thoughts on “Analysis: The law of diminishing returns”

Leave a comment Cancel reply