NOLA Hackathon 2011/Sunday

Notes from Sunday, 17 October 2011. Part of NOLA Hackathon.

Chad - testing workshop
Brion - git workshop

Wrapup at 6pm -- please leave by 6:30pm

Some goals for the event:

               1) Get to know each other, have fun
               2) Everyone has base understanding of git
               3) Every developer has labs account
               4) Labs has first services
               5) Juju running in labs
               6) Jenkins is up and running
               Add you own goals ("Fix my first bug")
       Stretch Goals - if we make these, we'll be thrilled
               1) Labs runs a full MediaWiki setup
               2) Shared gadget demo repository?
               Add your own stretch goals ("Get my first fix deployed into production")

Some topics of interest:

glorious CI future
glorious git future
swift media uploads & related
labs
parser
non-MySQL support (dj -> sql server)

Friday's notes archived at NOLA Hackathon/Friday

Saturday's notes archived at NOLA Hackathon/Saturday

Some sort of discussion in the back room edit

Something about shell requests <- please someone take notes ... I get the sense Mark Hershberger, robla, reedy, & tim are doing an etherpad somewhere else -- please link?

Some other sort of discussion in the lounge edit

<- please someone take notes

Chad's test training edit

In kitchen conference room

Audio:

HowtowriteMediaWikiunittests-Oct2011

Testing is absolutely awesome!

We use PHPUnit for most of the testing... MediaWikiTestCase wrapper class bootstraps the environment for you. Unit tests are meant to test small pieces of code, so you wrap a test case class around a few specific things. Ties in with continuous integration, running tests & poking stuff back Currently we don't have very good coverage; it's a 10-year old code base and many things are kinda scary!

For new code, it's easiest to write the tests first as you're working.
For old code, writing tests requires knowledge of how it's supposed to work -- and sometimes refactoring of things that aren't well self-contained.
- if want to catch up on old tests, best fo rnewbies to pair up or group up with experienced devs

Two ways to write tests:

The Right Way
The way people usually do ;)

Each test should have no more than 1 or 2 assertions, or it's hard to figure out what went wrong. Data providers make your life easier when doing multiple inputs!

data provider function returns an array of parameters, they get passed to the test function

Example test being written for wfShorthandToInteger (in GlobalFunctions.php)
This is a nice self-contained function, so perfect to write unit tests for.
Transforms things like '1g' to a byte count
Tests for classes usually go in a test class named for it: User -> UserTest
For function farms like GlobalFunctions.php we're migrating out from GlobalTest giant class to individually grouped ones: wfShorthandToIntegerTest

<?php
class wfShorthandToIntegerTest extends MediaWikiTestCase {
    function provideABunchOfShorthands() {
        return array(
       );
    }
}

So let's design a test function:

    function testWfShorthandToInteger( $input, $output, $description ) {
    }

the provider can be like:

    function provideABunchOfShorthands() {
        return array(
            array('input', 'output', 'description')
       );
    }

This is marked with a special comment on the test:

    /**
     * @dataProvider provideABunchOfShorthands
     */
    function testWfShorthandToInteger( $input, $output, $description ) {
    }

There are lots of assertion methods on the test case classes; most generic is assertEquals() -- you can expect to use it a lot! Start writing some sets of data for which you have known results... We see there's a trim() at the start, and a check for empty input gives -1 return. That gives us an input to try:

    array( '', -1, 'Empty string' ),
    array( '     ', -1, 'String of spaces'),

^ descriptions are important: human-readable description helps you figure out what actually broke so you can track it down. Metadata is a love note to the future! Assertion might look like:

    $this->assertEquals(
        wfShorthandToInteger( $input ),
        $output,
        $description );

Nice and simple! This test has only ONE assertion... but we'll run it over as many data sets as we want to provide. DO NOT just list out a bunch in your function, that'll be hard to maintain and work with!!!

    array( '1G', 1024 * 1024 * 1024, "One gig" ),

What happens when you pass in an unexpected or invalid value? These are great chances to write test cases. :) Try passing in a null! Try passing in a string that doesn't end in the expected variables! Sometimes you'll find the function needs to be fixed to return a sane value or produce a sane error. To test things that are expected to return exceptions, write a separate test and use the @expectsException annotation. Feel free to be verbose in naming test functions; they should tell a story. Nobody's going to type those test names -- they're just going to see them in output! Brevity doesn't help here. Running tests... Must install PHPUnit libraries via PEAR; try to avoid the one that comes with your distro as it may be very out of date. Don't use the 'phpunit' command itself, cause our environment bootstrap is too ugly. :( Use tests/phpunit/phpunit.php wrapper script:

$ php phpunit.php includes/GlobalFunctions/wfShorthandToInteger.php
PHPUnit 3.5.15 by Sebastian Bergmann.
...
Time: 0 seconds, Memory: 14.00Mb
OK (3 tests, 3 assertions)

Gives you nice little dots when it works; letters for failures, followed by lots of details.

PHPUnit 3.5.15 by Sebastian Bergmann.
...F
Time: 1 second, Memory: 14.25Mb
There was 1 failure:
1) wfShorthandToIntegerTest::testWfShorthandToInteger with data set #3 ('2G', 1073741824, 'Fake One gig')
Fake One gig
Failed asserting that <integer:1073741824> matches expected <integer:2147483648>.
/Library/WebServer/Documents/trunk/tests/phpunit/includes/GlobalFunctions/wfShorthandToInteger.php:10
/Library/WebServer/Documents/trunk/tests/phpunit/MediaWikiTestCase.php:64
/Library/WebServer/Documents/trunk/tests/phpunit/MediaWikiPHPUnitCommand.php:31
/Library/WebServer/Documents/trunk/tests/phpunit/phpunit.php:60
FAILURES!
Tests: 4, Assertions: 4, Failures: 1.

https://integration.wikimedia.org/ci/ -- Jenkins runs these tests automatically after commits, and will record the failures & whine to IRC #mediawiki to let people know. But it's still better to run them before you commit so you don't trigger the alarm! ;) More documentation at Testing portal especially Manual:Unit testing ... we are no longer using CruiseControl Looking over Russ's StoreBatchTest...

some cases are fairly biggish, testing a couple things on each step
some of those should be broken up into separate data sets
but some stuff has to go in between
- this can indicate interdependencies in your code that is tricky -- ideally you test the smallest piece of code possible. Side effects and dependencies are scary :D
- use of @depends annotation can help to organize these into separate functions when you need to do things serially

Fixing broken tests...

on new code, probably means the code's wrong cause you wrote the test first. ;) Fix it!
on old code -- you may have written the test on incorrect assumptions. Check that too!
- looking over the function is a great chance to make sure it's clear and well-commented code too! don't be afraid to add comments while you work. If it's unclear in <5s of looking at it, add a comment. :) They're cheap/free.

Deprecation of old code -- notes! use of wfDeprecated() to trigger warnings Deprecation <- needs more cowbell Writing tests for existing code can be a time sink, so beware when diving in. Work with a buddy! Mark things as incomplete if they don't quite work -- it can be useful to get them in for a start.

FileBackend edit

(archived from Etherpad)

FileRepo operations edit

publish
- Move dest -> archive
- Copy source -> dest
- Optionally delete source
store
- If dest exists, check overwrite mode
- Copy source -> dest
- Optionally delete source
storeTemp
- Generate name
- Fail if dest exists
- Copy source -> dest
append/concatenate
freeTemp
- Delete a file if it exists
fileExists
delete
- Either:
  - Move file to deletion archive, overwrite same
  - Delete file
getFileProps
enumFiles
cleanupDeletedBatch
- Delete file

FileBackend basic operations edit

src is mwrepo:// or a file:/// url dest is mwrepo://

copy -- fails if the destination exists.
- ignoreMissingSource
- overwriteDest
- overwriteSame - does not actually overwrite, just checks and passes if they are same
- source
- dest
delete -- delete src (ignores zone and destination)
- source
move
- ignoreMissingSource
- overwriteDest
- overwriteSame - does not actually overwrite, just checks and passes if they are same
- source
- dest
concatenate
- source array
- dest
- overwriteDest
fileExists
- source
getFileProps
- source
enumFiles
getLocalCopy
- source
streamFile
- source

GSoC discussion edit

Topic 1: getting students to start using #mediawiki. MaxSem pointed out that it took a while to prod Salvatore to use channel rather than PM. Idea for next year: icebreaker event on IRC Neil and Kevin started with weekly calls on Skype, and then switched to daily calls, which worked better. Salvatore pointed out that those speaking English as a second-language would have a tough time with Skype. Sumana asked about intro mail. Kevin said he procrastinated on sending that mail out. Sumana suggests a "basic media fundamentals" intro. User:MaxSem/GSoC analysis Salvatore has been working with Roan to include his work with ResourceLoader 2 (circa 1.19 deployment) Things to look out for: 1. Ensure project scope is sane 2. Enthusiastic, dedicated mentor 3. Double checking that students are actually responding to their feedback How to get students to adjust their proposals:

Kevin started down one road (Wikipedia 1.0 bot), saw there was potential duplication, and switched.
Salvatore already had an idea based on his work on Gadgets

Retention - sometimes language is a big barrier. Current work on retention is basically about making sure stuff gets into trunk.

Git stuff edit

Git session (slides & audio) 3pm start -- in the lounge! Everything in Git is an object, with a SHA-1 hash. Note: you can refer to revisions by the first few characters of a revision hash to refer to the rev. Object types: blog, tree and commit. Blobs contain just a little bit of metadata (size, checksum) and the data. Tree is a structured list of nodes (blobs) with file modes, timestamps (?) and possibly submodules if you're using them. The most important from a user perspective is the "commit", which is a tree, it's parent, and the metadata associated with the commit. Versioned directory tree: note that a "tree" can be messy, with overlapping references to other trees. Brion demos gitk to show non-linear nature of commit history. Branching models are variable. This slide shows a reasonably sensible branch strategy, which is to have "develop", release branches, and the master which only has finished releases. Developers branch from develop most of the time. OpenStack, for example, requires everything to be made in feature branches which get merged. Repo sharing models. In Git, there is no difference between the server repository and the local repository. When you check out a repo, you actually clone the repo. The Singleton. In Git, using "git init" is really cheap and easy, and will make it so you can make versioned local mods. P2P Shuffle. The simplest model of using Git is the P2P model, where everyone just has their own local copies, and you ship around patches and ad hoc pulling from peer repositories. If you have more than two people working on a serious project, don't do it, because you'll go insane. Mothership. This model is for "serious" projects. This involves having a central master repo, which isn't special from a tech perspective, but is blessed by convention. Pull requests. This is the GitHub (and friends) model. Mothership + shared personal clones. Authorship information is maintained, and GPG signing is supported. Push to review. This is the Gerritt style. Similar to the GitHub mode, but triggered by pushing to a shared repo. Extension example. Brion demos the OEmbed extension as something that can be managed in Git. https://github.com/brion/OEmbedConsumer Step 1: Grab the url: https://github.com/brion/OEmbedConsumer.git Step 2: check it out locally: git clone https://github.com/brion/OEmbedConsumer.git Step 3: Now to do crazy stuff and go into the .git directory: Everything in that directory except for the objects is metadata Mostly look but don't touch unless you really know what you're doing Step 4. git log - basic stuff Step 5: gitk - now loking at the revision metadata An aside about branches. Subversion branches are weird, which are just directories. svn 1.5 added mergeinfo, which sorta kinda works, but is actually kinda horrible. Git supports branches. Brion spulunks into the .git directory to show branches. Fun fact: "git branch" command tells you what you have, and can be used for branching, but beware: "git branch newbranch" won't switch you to that branch until you "git checkout newbranch". You can use "git checkout -b newbranch" to create and check out a new branch in one step. Step 6: git checkout -b error-handling (Brion then makes a localmod to a file in the local repository) Fun fact about "git diff". The commit model for git is a two-stage process. You can change your working copy, but then you need to add it to a staging area (the "index"). "git diff", by default, will show you the diff between your working copy and index. "git add" will "stage" a file by stuffing it in the index. "git diff --cached" will show the diff between the index and the repo. "git commit" - as long as something is staged (see above), this will commit to your local repo. Step 7: git commit "git remote add mystuff git@github.com:briontest/OEmbedConsumer.git" - this creates a new "remote" called "mystuff" that one can push to, e.g.: git push mystuff error-handling Step 8 (which should have been earlier): Test code. see it's broken make fix then, Step 9: git commit -a (same as adding all *modified* files then committing) Step 10: git push mystuff error-handling Step 11: Use github to send a pull request. Uses the github user interface to merge the pull request. Step 12 (in other working copy): git pull - pulls from the remote copy Question: What's the red reset button? git reset --hard HEAD If you really screw things up, you can also reset --hard to the origin git reset --hard origin/master This will ERASE ALL UNPUSHED LOCAL MODS To avoid disaster, always work in branches.

== edit

Rebasing Scary! Useful when cleaning up a work branch for publishing. git rebase -i ^^ interactive rebase. Gives crazy interface like so pick foo pick bar

Commands:
p, pick = use commit
r, reword = use commit, but edit the commit message
e, edit = use commit, but stop for amending
s, squash = use commit, but meld into previous commit
f, fixup = like "squash", but discard this commit's log message
x, exec = run command (the rest of the line) using shell
If you remove a line here THAT COMMIT WILL BE LOST.
However, if you remove everything, the rebase will be aborted.

rebase represents a big difference between git and mercurial. In Mercurial, rebases are considered kinda bad because of the ability to screw up history. ACTION ITEM: need to make sure that gerrit supports workflow "git stash" - hipster mustache feature that makes the first album was better. GIT STASH! GIT STASH! GIT STAAASH! It rocks, use it! Multiple repo setup: One repo per extension, plus repo for core May use "repo" tool from Android project Brion is working with Siebrand to see if separate repos will work for them. They believe they've worked out something workable. However, even with a helper tool, if you have an update that touches 300 repos (extensions), it's very, very slow. Copies on github and gitorious are being considered. An official mediawiki copy is possible to avoid you from having to use your own quota space on your account. Everyone who is a registered devoloper and setup with subversion now might get their own repsoitories. Registered accounts with capchas would be able to do push requests as an alternative to doing patchs in bugzilla. In other news we are 26 commits away from 100k! http://progit.org/2011/07/11/reset.html ^ this should be helpful background for git reset / git rebase stuff!

Max's demo edit

Special:ApiSandbox extension form to generate API queries from a bunch of parameters, interactively return the results. Handy for gadget & bot developers to adjust a query before sticking it in your source! several of us note we wished we had this recently :D Needs JS front-end review, otherwise should be ready to land!

Wrap-up edit

Poke Sumana with any feedback! Feedback helps us!

[brion] we got git talk done!
- estimate git popularity better ;)
[ben] puppet configs for swift media testing well on its way!
- enormous amount of planning and work done on the swift & filerepo refactoring & such!
[ben] fixed something in UW :D
[max] ran GSOC discussion -- lots of good talk on improving for next year!
- [max] demo'd api sandbox -- awesome!
[rlane] stole commit 100k from chad
[rlane] installed 20-something VMs on labs, got a bunch of people set up, fixed lots of related bugs
[tim] lots of work on filebackend/filerepo refactoring
[dj] can hand off azure file stuff to those other guys now :D
[dj & ben/ms] working on getting sql server all working on trunk! thx to max for some installer help!
- ^ getting an ecosystem to keep things running more consistently
[markus] good planning on improving communication & access for 3rd-party devs (archived notes from sat)
[salvatore] showed gsoc work to roan, talking about integration, stronger plans to work on
[roan] committed salvatore's patches
[roan] attempted to hack inline diffs into gerrit -- it's harder than it looked. ;) needs more work
[mark b] completely redid varnish management in puppet, integrated mobile & bits in prep for new eqiad and esams deployments! very close to done.
[robh] ^ helped all those guys doing stuff, helped chad w/ getting tests running
[chad] integration.mediawiki.org !
[hexmode] got through a bunch of the 1.18 triage issues, started talking through some of the shell issues -- MUCH quicker than over irc or phone!
[robla] seeing momentum on swift, seeing the gsoc discussion very helpful
[kevin] feedback & prep
[leslie] made first puppet checkin! learned a bunch about the systems
[roan] log sprint -- fixing up some apache log error issues with only slight breakage ;) -- clean up logs to make it easier to find the real errors
[sam] shell bugs, misc mobile bits, etc
[leisa] getting up to speed! from nothing to having a working dev environment. impressed by seeing how we deal with all these issues (testing, growth, object stores etc)
[russ] swift weekend! documentation & tests gotten in good shape, identified a bug with tim
[daniel] a number of RT tickets got resolved much quicker in person!
[aaron] swift w/ the other folks & the 1.18 regressions
[asher] varnish stuff w/ mark etc; packaging cleanup; talking about plans for geo search etc
[mark m_3] scale testing, started to get things going on the labs stack -- strong momentum
[ct] lots of little things, went well :)
[erik] learned about labs & puppet stuff. committed some code for contest ext; learned more about git
[dana] learned about git and hiphop :)

Also, Kapil and Mark got some initial juju architecture up, but couldn't get juju to launch instances. it's back in our court; we need to upgrade openstack for them to continue, since there are bugs in openstack cactus that they fixed in diablo.

From Kapil:

"That's about the nutshell, it looks like that we'll need to either to wait till 12.04 LTS, or the labs install openstack from a ppa. I got a chance to show some of the folks working on swift media, the ease of deploying swift with juju.

Even though it wasn't a success out of the box getting juju on the labs, i'm hopeful it will get some traction once the labs are updated. Twas, lots of fun meeting up with the wikimedia folks."

Mark adds:

"sounds right to me too... might be worth adding in the goal for what we're trying to do: Provide a simple way for wmf devs to spin up stacks of services within the labs environment. This'll particularly help with developing and testing mediawiki features that're hard to do in a laptop environment... i.e., squid/varnish integration, uploads to nfs mounts, multiple nodes on the same wiki, wiki nodes working with mysql replication too. The ultimate result is code that's better tested before hitting production.

I'll keep in touch with Ryan to follow the labs status and see if we either need to work around the openstack version issue or wait for an upgrade... totally depends on when he's planning to upgrade to diablo.

Awesome meeting everybody and getting a chance to work together!"

-- END -- \o/