The mob is organizing to participate

Companies with works that can be broken into short, repetitive tasks that still require a discerning human to complete are turning to cloud labor to distribute these tasks to workers throughout the world. Amazon’s Mechanical Turk was the first to define the field, followed by more targeted and nimble services like CrowdFlower and txteagle.
I have noticed a number of researchers using these tools to recruit participants and collect data, so I wanted to see whether they would work in my own work. I opted for CrowdFlower because they have a completely unintimidating sign-up procedure that encourages you to play with designing tasks before you push your survey out into the world.
Once you are in, you can go about creating a job. If you are making a survey or trying to field a psychological instrument, the interface suggests that you can get started without first adding data. Under the more task/job orientation of most users of these types of services, you need to populate the Job with the information you want workers to work on (such as a list of URLs to visit). This is not quite what we want, but I have found the rest of the system doesn’t work if you don’t populate the Job with some sort of data.

So what you can do is just create a two line .csv file with something like a survey identification number.
Place that in a plain text file and upload it to CrowdFlower (if your browser blocks Flash, whitelist crowdflower.com; the uploader depends on it).

Now you’re ready to get cracking. CrowdFlower has a very nice little form editor (under the Edit tab). However, you’d probably like to avoid pointing, clicking, and dragging as much as possible. It is also likely you already have the item content for your questionnaire. The thing to do, then, is to skip the GUI form editor and head straight for the CrowdFlower markup language, which gives you XML tags for designing the content of your survey. This is the real gem of CrowdFlower’s platform.
Once you are through designing your survey, you are read to Order Judgments (sounds serious, right?). You’ll want to skip the calibration step, which presents you with a dummy copy of your survey and times you as you complete it. You should already know how long it takes to do your questionnaire.

Advanced Settings holds all the action. For Judgments per unit, put the number of individuals you want to fill out your survey. Remember that this whole interface is for microtasks, so it assumes a worker might get a page of 4 or 5 tasks to complete at once. That is not what we want, since all of our questions are in a single survey. Thus, Units per assignment should be 1.

CrowdFlower ties into two different labor communities: Amazon Turk and Give Work/Samasource. However, they also have a free internal interface that generates a URL you can give to your participants. For example, the survey I just made asks: Are you an individual?
So while CrowdFlower has many nice features, it isn’t quite suited for psychological surveys. This is hardly a criticism, since it wasn’t designed for this type of task. The main problem is the screen participants see once they’ve submitted the task. It isn’t what I’d describe as a good debriefing. That said, there is a type of task that this interface is quite suited for, which is assessing personality in nonhuman primates. Like the kind of jobs that CrowdFlower was designed for, having a number of raters assess the personality of some apes or monkeys precisely getting a number of judgments by each worker (the raters) on the assigned units (the primate subjects).

Lastly, a few tips
- In the CSV file you use to import the units, don’t just use variables that will appear to the raters in the task information, but also metadata that will help you sort and organize the data later. Examples are stud numbers or database IDs for each animal.
You cannot edit the survey content once the job is running, so get it right before starting.
In CML, you can provide an alternative name that will show up as the column label in the dataset. Capitalization of this label won’t be preserved.
The output gives you some other information about the workers, such as an ID (which I assume is somehow tied to cookies in their browser) and the city they are connecting from.
photo cc-by Amodiovalerio Verde