There are more than one way to contribute to open source projects. Unfortunately, eclipse dash only shows one aspect of it, the code; and even for this one aspect it does not capture the reality of things. Why is this? Dash counts CVS commit and this misses the key aspect of who authored the code.
So why am I getting to this today? Because people are using dash as the scarecrow on diversity, but I believe it does not represent the reality of each project. Here the case study of p2:
1% for Cloudsmith? 1% for EclipseSource? WTF? This does not look very diverse... Unfortunately these numbers shows exactly what I want: they are bogus. They do not represent the reality of the investment done by those two companies or the number of patches received by individuals. Indeed, Thomas H., Henrik L. and Ian B. have all been regular contributors to the project and know a lot of the code base. In fact I'm sure that if IBM was to pull the plug on the project it would carry on just fine (probably with even more freedom since I would be gone
So why are the numbers so low?
- Patches committed for others. I have been committing a lot of patches either from the community or on behalf of Thomas H. Unfortunately this again inflate the IBM numbers to the detriment of Cloudsmith or "individuals".
- Lately the code has been very much in flux caused by a large refactoring (package rename, etc) which inflates the commit count and dilute others commits.
- Number of IBM committers. IBM has more committers than others on the project thus allowing for more code to be produced. However if those companies were to increase their number of participants (wink, wink) to a number equal to those of IBM, they would then be at par. Maybe should we compare the companies based on the average commit per committer (e.g commitCount / committer).
I'm sure that I'm missing other factors about why those numbers are so low, but you get the point... Though I recognize that almost every project would use a little more diversity, we have to be careful on how numbers are being used. If we want to use dash as a reliable hint on the activity and diversity, then we should revise how the numbers are being computed to take into account: patch author instead of committer, activity in bugs, activity on ML, number of ppl asking questions in forums, etc...