Sunday, November 29, 2009

Those lying numbers


There are more than one way to contribute to open source projects. Unfortunately, eclipse dash only shows one aspect of it, the code; and even for this one aspect it does not capture the reality of things. Why is this? Dash counts CVS commit and this misses the key aspect of who authored the code.


So why am I getting to this today? Because people are using dash as the scarecrow on diversity, but I believe it does not represent the reality of each project. Here the case study of p2:

1% for Cloudsmith? 1% for EclipseSource? WTF? This does not look very diverse... Unfortunately these numbers shows exactly what I want: they are bogus. They do not represent the reality of the investment done by those two companies or the number of patches received by individuals. Indeed, Thomas H., Henrik L. and Ian B. have all been regular contributors to the project and know a lot of the code base. In fact I'm sure that if IBM was to pull the plug on the project it would carry on just fine (probably with even more freedom since I would be gone ). Their companies have products based on p2 (I believe this to be a sign of commitment for the size of p2), they come to every call, and are not afraid of taking on big issues, etc...


So why are the numbers so low?

  • Patches committed for others. I have been committing a lot of patches either from the community or on behalf of Thomas H. Unfortunately this again inflate the IBM numbers to the detriment of Cloudsmith or "individuals".
  • Lately the code has been very much in flux caused by a large refactoring (package rename, etc) which inflates the commit count and dilute others commits.
  • Number of IBM committers. IBM has more committers than others on the project thus allowing for more code to be produced. However if those companies were to increase their number of participants (wink, wink) to a number equal to those of IBM, they would then be at par. Maybe should we compare the companies based on the average commit per committer (e.g commitCount / committer).


I'm sure that I'm missing other factors about why those numbers are so low, but you get the point... Though I recognize that almost every project would use a little more diversity, we have to be careful on how numbers are being used. If we want to use dash as a reliable hint on the activity and diversity, then we should revise how the numbers are being computed to take into account: patch author instead of committer, activity in bugs, activity on ML, number of ppl asking questions in forums, etc...

Project diversity

When a project is not diverse, I find it a bit too easy to blame it on the actual company who started the project, the people steering it or how inviting to contribution the project is. Though I don't want to underplay those points, I argue that there are other things that matter:

Relevance of the project: maybe the topic of your project does not interest anyone but you or your company. Sorry.


Timing of the project: you can have the coolest technology, if you are too late or too early, it will be harder to excite the crowds.


Pace of the project: the project goes too fast for others to follow or committers to accept external contributions (e.g. the pace imposed by the company's internal schedule is such that it does not allow for the external contributions to be considered by committers). Conversely The project goes too slow for anyone to be willing to bet on it.


Quality of the project: the project is running well, builds are regular, deliverables are on schedule, bugs and enhancements are dealt with quickly. The community gets what it needs, why should it care?


Amount of code: is there enough code to show the direction of the project. People are happy to work on a project but I think they feel more conformable starting from a working code base than a whiteboard.


Now if I relate that to p2, where we have a diverse community of contributors (IBM, Cloudsmith, EclipseSource, University of Lille-Artois) I think we have been lucky because p2 came at the right time, solving a real pain point. Indeed, the same year we announced p2 (EclipseCon 2007), there was at least 5 talks on how to manage Eclipse, and Cisco announced the creation of the Mayinstall project. As for code goes, we already had a functional prototype and we continued developing it in the open, holding public calls every week. All that said, I believe that p2 would still be a single-company-developed project, if the companies who joined the force had not had a business interest to contribute.


Where does leave us? Luck. I would argue that much like for any other success, the success and diversity of a project just happen to be a combination of preparation and timing with of course a zest of hard-work and persistence.


Friday, November 27, 2009

Nesting categories

A recurring topic around categorization in p2 is the ability to nest categories.


Like for the categorization of bundles, the trick consists in using the eclipse feature editor to express the dependencies and thus construct the desired nesting.

Steps:
First phase, creation of the inner category.
  1. Create a feature project called InnerCategory1. In our example this will be the feature containing the elements to be shown categorized.
  2. Turn this feature into a category by creating a p2.inf and filling it with:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true
  3. Remove everything from the build.properties of the feature.
  4. Use the features and plug-ins tab of the feature editor to add content to be categorized.
This concludes the creation of the inner category. The following steps create the "top level" category.
  1. Create a feature project called TopLevelCategory
  2. Turn this feature into a category creating a p2.inf and filling it with:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true
  3. Remove everything from the build.properties of the feature.
  4. In the features tab of the editor, add InnerCategory1
  5. Export the top level feature enabling metadata generation
You can find the code of this example on the wiki.
Happy categorization, happy provisioning!

Categorizing plug-ins

Despite what is commonly believed and what I have repeated to several occasions, p2 does allow for the installation of anything, anything being for example just bundles. It just happens that today, with the practices adopted over the years, features have grown to be the primary way of delivering features.
I'm now showing how to create a p2 repository whose category refers to bundles.
Steps:
  1. Create a new feature project. This adds the category property to the installable unit, thus allowing the UI to recognize the feature as a category to be displayed.
  2. Create a file named p2.inf in the feature project and paste in the following two lines:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true

  3. Include the plug-ins you want to see being categorized. This is where you define what goes in the category. Each entry in the plug-ins list will be shown under the category.

  4. Remove everything from the build.properties included in the feature. This causes PDE to not generate a feature.jar.
  5. Export the feature enabling metadata generation.

You can find a zip of this example on the wiki.
Happy categorization, happy provisioning!

Sunday, November 01, 2009

p2 is going public

I mean API public. Therefore we are soliciting input from people who have been using our provisional API or people that are looking into using it. To do so, please open a bug report capturing your use cases as well as stating the problems you have been experiencing with the current provisional API.

Thanks in advance.

Thursday, August 27, 2009

p2 community contribution

If you are interested in contributing to p2 during the Helios (3.6.) cycle, please add your areas of interest on the wiki. It can either be new functionalities as well as things that you would like to see improved.

Expressing your desire early will allow for other contributors to potentially join you and for the current set of p2 committers to plan for coming contributions.

Thanks in advance.


Monday, August 03, 2009

p2 EclipseCon 09 slides posted

I have just posted the slides for the talk and tutorial I gave on p2 at EclispeCon 2009.
The talk goes over the p2 functionalities from a runtime perspective and the build / dev time aspects.
The tutorial covers all the major aspects of p2. It ranges from the simple usage of product delivery to an in-depth presentation of the p2 concepts.
Enjoy!

Thursday, July 30, 2009

p2 metadata and resolution detailed

Daniel Le Berre (one of the author of SAT4J) and myself have a paper accepted to IWOCE. The paper focuses on the dependency management aspect of p2. It describes the metadata used to express dependencies, the overall functioning of our resolver and a description of our propositional constraints based encoding. To conclude we describe the challenges to address in future releases.

The paper is available at http://www.cril.univ-artois.fr/spip/publications/iwoce907-leberre.pdf

OSGi DevCon Europe 2009 slides

I have finally uploaded the slides of my OSGi DevCon talk on slideshare. The presentation goes over some of the p2 concepts and talks for the first time about 3 different ways to setup p2 depending on the constraints of your environment: milli, micro, nano.

Wednesday, June 03, 2009

p2 at OSGi DevCon Europe / Jazoon

If everything goes well (understand my travel request gets approved), I will be presenting p2 at OSGi DevCon Europe / Jazoon in Zurich on June 22nd.

If you want to meet to chat about p2, build or OSGi, please drop me a note (esp. if you are an IBM Rational customer) and we will schedule something during my short visit in Switzerland.


Tuesday, May 19, 2009

What is the Galileo repo?

The Galileo repository is a one stop shop for all the bits and pieces of this release and thus facilitates consumers life. It guarantees by construction that all the pieces that are available from it are meant to work together.
However, this does not mean that you will be able to select all the entries from the repository and successfully install them on top your SDK. Why? Because this repository contains things like SDKs and runtimes (e.g. Riena, Swordfish, etc.) that are meant to be installed in your target rather than your running instance.

Monday, May 18, 2009

p2, call for community testing

During 3.5, a release put under the theme of robustness for our team, p2 went through a lot of changes: new UI, improved error reporting, more robust downloads, improved transactionality of the installation,.. the list is endless. About 900 bugs have been closed.

However, in order to avoid releasing 3.5 with big issues (which we think/hope we don't have), I'm calling out to you to try out p2 by downloading RC1 and give it a spin and a hard time, and report any problem in the p2 component here.

Thanks in advance.

Tuesday, March 31, 2009

p2 related projects for GSOC

I have added a few new projects to the Google Summer of Code idea page relating to p2. This brings the count of p2 related projects to 4. If you are a student and want to sign up for any of these topics or submit new ideas, feel free to do so. But don't forget to voice your interest on soc-dev@eclipse.org.
To cut down the search through the proliferation of idea here are the 4 p2 topics:

Web triggered installation
The goal of this project is to develop a mechanism by which the installation of a plug-in can be triggered from a single click on a webpage, thus facilitating the extension of eclipse for non typical users. Some of the challenges of this project are: communication with the running instances of eclipse, identification and presentation of running eclipse instances, "security".

Provisioning OSGi clouds
With very clearly defined boundaries between its different components, p2 offers a lot of possibilities when it comes to set it up in a multi-tiered provisioning system. The goal of this project is to provide a configuration of p2 that can be used to deploy OSGi-based applications on the cloud. This will include the creation of an Equinox EC2 image, exploration on how configurations can be managed, and also how to leverage the cloud storage capabilities.

Power-user p2 views
The current Eclipse p2 Update UI is targeted to both Eclipse SDK users and end-users of Eclipse-based products. Since the UI must support users with less knowledge than the typical Eclipse SDK user, the end result is a wizard-style (modal), task-based approach. Many power users, in particular Linux users, prefer a modeless, "dashboard" style of interaction. The current p2 "Admin UI" is targeted toward p2 developers who need to see every detail of the underlying p2 model objects. There is a need for something in between these two extremes. Users have mentioned Linux package management front ends such as Synaptic Package Manager as appropriate UIs for this audience. If appropriate, this UI could replace the p2 admin UI, but it's not clear if that should be the goal.

Debugging aid for p2 installation issues
Debugging cases where features cannot be installed into Eclipse due to inability to reconcile requirements of existing features and the new features being installed is quite challenging. It would be useful to create a tool for capturing the details of the environment and what was being installed to aid in reporting problems. Once it is possible to generate these dumps, a graphical explorer tool that would allow the developers to trace the dependencies and see the problems would make it significantly easier to debug these problems.

Saturday, March 21, 2009

p2 content at EclipseCon

This year is a good year for p2 content at EclipseCon. Among the 26 sessions that got submitted and mentioned p2 in their abstract 13 of them have been accepted.

The p2 fest starts with our tutorial on p2, Monday afternoon. However I'm sure that Andrew N. will have discussed p2 integration in the build in the context of the Common builder tutorial in the morning. Also Monday afternoon is Kai's RCP tutorial that seems to describe the usage of p2 into an RCP application

Then on Tuesday the party continues with Jeff's talk on the Runtime (r)evolution which I'm sure will mention p2 and also with Richard and Markus who have been busily creating p2-enabled Galileo packages.

However Wednesday is "the" p2 day with:
  1. Henrik and Thomas talk about Buckminster and p2
  2. Darin and zx talk on PDE
  3. My talk on what's new in p2
  4. The short talk session on web-centric technology the newly p2-rebased Yoxos and the EPP Wizard (but I'm not sure how much p2 there will be in those talk).
  5. The short talk session on runtime deployment where server side provisioning with p2 and the complexity of versioning in a provisioned world are discussed.
  6. And finally the p2 BOF.
I would be surprised if by the end of the day I was still able to talk.

Have a safe trip and see you there.

Wednesday, February 25, 2009

FindBugs review

Lured by the promise land of a code base free of any problem I have decided to give a try to FindBugs on my freshly updated I-build.
Unfortunately FindBugs did not leave to the expectation since it failed with an NPE, which you would admit is rather ironic for such a tool...
Anyway, last time it worked for me (a few months back) my experience was in the end not really more useful than today's one since the tool returned a plethora of false positive hiding the real bugs it may have found. After talking to colleagues, we all came to the same conclusion, it could be useful if it had more reasonable defaults. Oh well...

Thursday, February 19, 2009

Thoughts on Bespin and Eclipse

I'm really stunned by the quite opposite reactions that the Mozilla and the Eclipse communities are having about Bespin.
On the one hand it is like Bespin is the best thing since sliced bread, ("hey look now I have a cool editor in my web browser") on the other a quite mixed reaction is coming ("why the heck would I want to ever do this", "where is my code completion, type hierarchy", etc.)
Honestly I'm not really surprised by this reaction from the eclipse community especially when you have been around long enough to remember the early days of CDT and the complaints it was receiving because it was not up to the task in comparison to JDT... or more recently on other fronts when you see bugs titled like "XYZ sucks", and the difference in participation on a new effort. Feels like our community is spoiled.

So where do I stand on Bespin? Well somewhere in the middle. I'm not impressed by the overall Bespin concept because it only solves one little part of the problem of creating an IDE: the snappy text editor and the collaboration. I like it, I like the fact that it is done with HTML5 canvas, the way the extensibility is working to invoke commands, etc. However when it comes down to running a full fledged IDE, I'm skeptical. Not about the abilities of the browser or canvas to scale (remember that a few years back Java was declared dead for rich client apps?), but simply because I know what it takes to have all the features that make you so productive in Eclipse. Most of them can't just work of one or two source files at least to give you assistance to the degree you are used to. We can probably get the information back from the server, but would it get back fast enough; we could have less precise operations, but are we ready to go with this?

All that said, I believe that there is a place where a Bespin based IDE (or the like) can be useful as a lightweight tool to do a quick task... (peer programming like scenarios, just need to fix a typo in the file, provide translation, etc.). But I don't think we are quite there yet, and there is a lot of cool problems to solve. For example the model as to where the resources being edited are coming from is unclear. One possibility is to assume that everything I need is already available on a remote server "on the cloud" (or my buddy's machine) then I can just edit this. Now, if my principal model is to work in a regular IDE and try to use my web IDE to just perform some mundane tasks, then the overhead of locating and opening the file would rather be low (Unless of course I can directly edit the code straight out of the code repository and put it back there on save (see lower)).

So what would I want to see / do / explore?
  • Explore with running a more complete headless Eclipse in the cloud. 
There I'm curious about the response time to get the result of for example a code completion, browsing type hierarchy, over the wire when hosted on a real cloud service. From there, I would want to see if any benefit can be had in having a caching proxy / partial replica of what's running on the cloud (and probably and augmented version of this with more functionality) on the user's machine so we can have better response time. For example maybe my server on the cloud does not offer completion, or type hierarchy browsing, but this can be done through my local server. Also this would solve the disconnected mode enigma with IDE based browsers.

  • Following this, is offering an "upgrade" path from the browser to the local experience. 
This addresses the case where I've started doing this mundane task but it turns into a bigger thing than expected and I need a complete IDE.

  • Making the workspace cloud friendly. 
When I hear cloud, I hear scalability, 100s of machines running my app. However I don't believe that our current workspace format is really that replication friendly. Do we need this? I don't know.

  • Explore DVCS as the underlying storage of a workspace. 
Would the usage of a DVCS server help in solving the remote server / local proxy synchronization that would result in a proxying solution mentioned earlier? Would that also facilitate handling a case where multiple people would work on the same workspace at the same time? And also does that make the workspace more cloud friendly

  • Could Bespin provide the web browser implementation of the eclipse app model (aka 20 things). 

  • Explore single sourcing (generative technique, model based, ...) of commands to make it easy to target both the we-based and client-based IDEs

  • Finally and unrelated to all that is to see if we can have an SWT for HTML5 canvas. 
Given that canvas seems rather new, it could bring to the browser fans the maturity of SWT and it could be a good space to win esp. if we want to ease single sourcing.


Caveat, this is just the fruit of my imagination as I have not been involved in the Mozilla/Eclipse meeting.