We tried to create a atmospherical stimulus for the different roles of developers and customers by using large background slides made with a beamer. That's easy but did not seem to work too good, since most participants did not look there frequently.
We used an abstract difficulty measure as estimation unit (bricks). The participants seemed to accept that easily and very seldomly spoke in terms of time instead.
Spikes should be limited. For the very small tasks we prepared (inflating balloons, sorting cards, folding paper) spikes were close to practices. This does not simulate the idea of spikes well. We should think about other tasks, that enable simulated spikes better.
Iteration time was with 180 seconds ok.
Perparation of iteration, i.e. the planning took too long (in relation).
Three iterations are too few, to learn about capacity and estimation accuracy. There always seem to be a low start, followed by a overly successful second iteration. Not until the third iteration the team capacity begins to tune to a meaningful and stable value. Hence five or even more iteration might show that behaviour better than three.
Granularity of tasks was ok. The team should be able to accomplsih at least five tasks in each iteration. Just in order to get enough data, to allow wrong estimations to compensate each other and to make use of the whole spectrum of difficulty values.
Cards should not be new, when they shall be used for building houses.
Foldings hats and boats from paper sheets is particularly nice since they can be transformed into each other easily.
For the acceptance for inflated balloons we used prepared cord chunks. It is better not to show these cords before the task is finished, because this eliminates a realistic uncertainty and risk.
Relative estimation was increasngly good. I.e. when two tasks were estimated with a certain relative difficulty, they increasingly (over the iterations) exactly took time in that relation.
Absolute estimation of capacity alternated. The values are: 8, 29, 10 (bricks).
In iteration 2 there is one task which was estimated particularly wrong (No. 06, ID 09-1). To me (HM) this seems to be a typical situation: here were four similar tasks that had been sestimated similarly and were scheduled to be done in a row. In such a situation learning effect should be included in the estimations. I.e. the tasks will eventually be accomplished more and more fastly.
The preceding notes led to the following idea (HM): The mapping from abstract estimations to expected times varies a lot. This variation may be caused by things like:
Mood of the team (optimistic, pessimistic)
Organization of tasks (repeating similar tasks in a row by the same people, parallel work etc.)
Individual strength of the people who are scheduled for the tasks.
It would be interesting to make this mapping from abstract estimations to expected times explicit and to make good predictions of it. How can we do that?
[#1] We had the intention to use the planning poker described in James W. Grenning Planning Poker. But the developers used the card intuitivley in a different way. The just sorted the story cards and put the poker cards onto the story cards. This allowed to correct the reestimation easily by moving the poker cards. Afterwards we had the impression that the planning poker would have been inappropriate for this team. It was untypically harmonic and found consensus easily.
[#2] The students themeself realized afterwards that they compared the stories within an iteration but not between iterations.
[#3] We should have a look at the data and do exact linear regression. Propably we get an very good correlation coefficient.
[#4] We should also do a linear regression with no constant summand, i.e. for linear functions through (0, 0). Looking at the picture it gives me the impression, that this might give in indicator for the team mood.
[#5] If we had another team doing the game there might have arised a slight competition. This probably would have reduced the length of the spikes.