MAPTCHA Notes

MAPTCHA Notes
Minimal mockup
Meetings and updates
Useful links @
References, literature

Some description summarising aim and the general idea

Aim: to protect web systems from spam while using the labour of captcha solvers to improve OpenStreetMap.

General idea: We have AI systems like fAIr that recognise buildings. They are not good enough to directly integrate into the map, or even give directly to mappers. On the other side, we have spammers creating bot accounts.

Some buildings are missing on OpenStreetMap completely, others were mapped wrong, and a few have been changed or demolished since they were mapped.

We would ask accounts who are suspected of being spammers to solve a challenge, asking them to click on the buildings in a selection of images. A small share of the images would be known positives and known negatives - used to validate whether the user is a bot; the rest of the images would be unknown - this is where we get free work from the user. The user would not know which are which. Ideally, the known positives/negatives would be examples that are known to be tricky for existing computer vision software.

This isn't only for buildings: we could validate the existance of other objects that are visible from aerial imagery like zebra crossings, roads, etc. or objects that are visible from street-side imagery like Panoramax or Mapillary, such as benches, access restrictions, speed restrictions, road signs…

We could also validate objects, for example the shapes of buildings if the AI prediction doesn't match what's mapped in OSM (has the building changed since it's been mapped? Was the mapping not done right?).

We would gather multiple votes on a single validation, requiring a minimum number of votes and minimum percentage of positive consensus on validation before sending the validated points to mappers. For example, if 12 users have seen an image and 80% agree that there's a building there, which OSM doesn't have, we could send it to maproulette to be mapped.

After using it on OSM we could expand… general use… sell protection on one side, data validation on the other… use the validated data to improve the training of computer vision systems.

Minimal mockup

We can use sample data from a Mapswipe challenge, e.g. https://web.mapswipe.org/#/en/projects/-O7hFcC2pKTnTh01SGds

~~The mockup would have a dozen images. We would have labeled known positives and negatives, and a few 'fake' unknowns that we would collect votes on.~~

~~We could reuse code from open source captchas. Altcha seems popular but doesn't implement puzzle solving…~~

Alpha version

What questions are we asking in the interface?
- Click all of the images where the red outline covers a building (Stuart)
- Click on correctly outlined buildings (Guillaume)
- Click on the images with correctly outlined buildings (Anna)
- Does the shape outline a building? (MapSwipe)
Would the interface have an "instructions" button? i.e. how does one behave in uncertain cases? → For the test we could just filter "easy" images (that is, where the category is obvious)
… would this would make the app less swift?
meaning of the inputs we get from users (hypothetical question: click all of the images where the red outline comprises a building)

Image category	Sample image	Expected response	Meaning
TP	Image Not Showing Possible Reasons The image was uploaded to a note which you don't have access to The note which the image was originally uploaded to has been deleted Learn More →	Yes / Agrees	Expected building (labels = prediction)
FP	Image Not Showing Possible Reasons The image was uploaded to a note which you don't have access to The note which the image was originally uploaded to has been deleted Learn More →	No / Doesn't agree	No building in the labels, there shouldn't be any where outlined
FN	Image Not Showing Possible Reasons The image was uploaded to a note which you don't have access to The note which the image was originally uploaded to has been deleted Learn More →	Yes / Agrees	Expected building (it's in the labels, but not predicted)
TN	Image Not Showing Possible Reasons The image was uploaded to a note which you don't have access to The note which the image was originally uploaded to has been deleted Learn More →	No / Doesn't agree	It's an image with no outlines (not in the labels, not in the prediction)

… shall we create fake TN with random oulines in the middle of nowhere? Or are we happy with having empty figures for this category?
Actual TN found by the algorithm (there're not many more)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

and

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Test alpha version with volunteers

We want to send the link of the working prototype link to GH-pages site to a "good" amount (~~30-40?~~ hundreds!!) of volunteers.

Are we running an A/B test?

steps to organise this:
- all ready by Jan 13th
- send link to invite people to take part … we shall write instructions for this, maybe update the README by then??
- give one week time to try it
shoud we make sure that people do not see repeated images? (relevant?)
would users see the same images?
how many images would they see?

Number of images per "session" (9? same for the swipe type?) and among these what is the proportion of:

known/unknowns (are we using these in the first version?) we do not have "uknown" predictions yet!
true/false/positives/negatives (is this actually relevant??)

SURVEY
We are setting up a questionnaire to get feedback from the users.
Type of questions (easier if it is not qualitative input, for infering some data out of it).

Questions below are of the type "strongly agree/agree/neutral/disagree/strongly disagree"

About the user's background/knowledge of GIS stuff/origin/age?!?

I have identified features (like buildings) from satellite imagery before

pertaining the interface look
- clear
- better swipe/grid

I prefer the grid/swipe format

[@stuart make this not a agree/disagree, only 2 options here, or 3 if you add 'neither']

about the images
- is the zoom level OK?
- are buildings clearly outlined? (quality of prediction)

I could easily identify the detected building
(You can add details in last question box)

I would like more/less zoomed imagery
(You can add details in last question box)

understandability of the task

The user would benefit from further instructions

How tedious do you find this (comparing experience with other CAPTCHAs for example)

I find this CAPTCHA tool very cool

What we missed (open question)

Is there anything we have missed and you would like to see in MapTCHA?
(For example: option for translation to other language, refresh/instructions/"skip" button, …)

General feeback (open question)

Anything you would like to suggest, feedback or comment…

(Technical) Questions

How do we generate the images

from fAIr → buildings, …more?
detected feature points or outlines (buildings, zebra crossings) etc … → Mapillary challenge?

Where is it going to be hosted? → dev.openstreetmap.org
Fundings?
How do we set the scores, weights, thresholds for validation? Start with something arbitrary and iterate?
Images: tiling + associate labels with the sat image…
I have the rgb image, and I have the (polygonised) prediction… I can glue them on top of each other (screenshot is not a viable solution as we want many of them, so can create a matplotlib figure and cut with some clipping tools + grid), and can also find a way to match with the labels to check if true/false… this is a bit trickier
how to choose TP/TN/FP/FN…? by visual inspection? how can we do that for 100s images, later on too?
Temporary script that does this for fAIr imagery. It has problems with TN, which are tricky to generate automatically
which zoom level to choose? 19 has plenty of footprints, 21 less, though:
zoom level affects n. of buildings available (obviously), but less obvious that this affects different regions in different way (depends on the "average" size of the buildings there), so might need to subsequently split the original zoom tiles in order to show only a couple of buildings
What makes the humans human in this task? If it's the action of clicking, then we have to deal with the situation where none of the images are clickable (i.e. either no building outlines are present, or the user doesn't see any)
… either a SKIP or Refresh button could do the job?
Rules for "goodness" of a building, is it to the user to decide or are we giving instructions? (With examples)

Logo

should be easy to catch
should contain reference to: map, matcha tea, maybe safety?
should be "iconisable", i.e. scalable to a small size - must be shown in the frontend window

Version used for the FOSDEM proposal (Dec 2024):

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Timeline

… to present at the next State of The Map?

Meetings and updates

March 18, 2025 (Zoom)

Stuart, Guillaume, Anna
Feedback from FOSDEM:
Maptcha Version 2
- the license!
discussing the feasibility

Jan 21, 2025 (Zoom)

Stuart, Guillaume, Anna
Feedback from the test (Mastodon):
- get rid of the Swipe version?
- TN … outlines yes/no … review initial question?
- zoom level
- accessibility:
  - language translation
  - color-blind friendly
  - other format than visual
- add instructions
- … more from the survey?
Analysis of the data obtained + survey
Set up slides:
- intro: about us, idea, captchas in gen
- the data: fAIr / computer vision!? / limitations
- prototype
- test's results
- ideas for version n.2
Let's write a paper‽

Jan 7, 2025 (Zoom)

Stuart, Guillaume, Anna
HOT is informed and happy to hear - integration with MapSwipe might turn useful
undiscussed points from last meeting
general update / FOSDEM participation
mockup test

No of images per category:

FN 91
FP 83
TN 137
TP 548

We shall need a database to store people's responses, and decide on rules about what to show to people.

Timeline: by 13th all ready, send link to potential testers with a week time to try it, get inputs and analyse them before FOSDEM

TODO:
Stuart → finishes off the app, updates to latest images, and addresses the points above
Anna → starts off the slides, and curates the survey questions
Guillaume → to polish off the images

Dec 19, 2024 (Zoom)

Stuart, Guillaume, Anna

Agenda

Dec 15, 2024

Proposal accepted for a talk at Fosdem 2025 (CfP)
See it here:
MapTCHA, the open source CAPTCHA that improves OpenStreetMap
https://pretalx.fosdem.org/fosdem-2025/talk/review/KMFMJ9NSWFYSW3DAGWBKK9BZFG7RBVRV

Nov 28, 2024 (Zoom)

Guillaume, Anna

Agenda

Nov (19) 21, 2024 (Zoom)

Stuart, Anna, Guillaume

Agenda

images we have and what we could preferably have
business with HOT - fAIr
Fosdem

Update from Stuart on the status of the mockup: he started something, involving also the family :), and the Fosdem deadline is good to push things ahead*.

Discussion about Fosdem and starting the shared document for the application.

In a lot of different ways this is a design problem, not a technical problem. Stuart

[*] Call for participation, geospatial room: https://lists.fosdem.org/pipermail/fosdem/2024q4/003597.html

Oct 18, 2024

Anna

To detect buildings by category… mapping the image into categories, to assign to each building a score in terms of which category it belongs to (90% overlap threshold, or something similar)

find_TPTNetc

Oct 15, 2024 (Turing)

John, Anna

Why it is so hard to generate (a ~huge number of) images for testing maptcha? [See questions 5-6 above]

Cases we usually see online are in general used for tasks of image classification, while the buildings detection is an image segmentation task.

class_seg

This makes it more complicated to create than, let's say, a typical captcha like this one:

captcha_classic

POSSIBLE SOLUTION: we ask the user to draw the polygon. This would be minimal editing, i.e. to click on 4 (maybe 6, not many more) vertices that make up the polygon of the building.
We get the input from more than one user (2? 4?) and set a threshold on which they agree → this becomes our "labels" that we obtain from the user and that we can use as TP to overlay with the prediction (to assess them, as a secondary output)

BUT

this would imply an analysis of the inputs, which doesn't fit with the prompt detection of human/not human that we need for distinguishing from robots.

THEN you WOULD anyways need a (fast) yes/no type of input for the human recognition step [we could eventually use the idea right above for the data user input].

So, where does this take us to?
Could we maybe set up 2 steps, one initial with questions like "click on the images that contain a building" like in the second figure above (we would find "easy" cases for this‽), and a second step where people draw the buildings, to get data from the users.

Note:

This MapSwipe Web project seem to do exactly it
mapSwipe_sample
I do see how this one can be problematic, too, and wonder how they would treat the input they get from the user.

IDEA !!! maybe we can take inspiration from the above, and use only one detection (buildings prediction outline, can be correct or not) per each image! After all, this is what fAIr currently accepts as a feedback input (on their predictions).
This should be easy to generate (estrapolate one outline at a time, zoom to it, buffer with background and export as a tile) and also to overlay with the labels mask/vector to filter them by T/F category.

Oct 3, 2024 (Zoom)

Stuart, Anna

Questions/initial thoughts from Stuart:

How many people would use the the captcha every day? (How many new users does OSM have daily? Does the captcha appear also for editing the wiki? Other cases?)
Where is it going to be used, OSM only or more widely?
He read an article recently about bots getting very good at solving the street level view ones

AI bots now beat 100% of those traffic-image CAPTCHAs

… if you make your own captcha, you would almost hope that someone creates a bot that cracks it, then you could actually use their algorithm, as they found the solution for you (!!!)

THERE ARE 2 (almost competing) THINGS TO FIGURE OUT:
1. how secure you can make it - otherwise it's useless, as it can be used by bots
2. how to get good data out of it and gain good info from users

… it has to work as a security measure before it can be used for data collection

And this takes us to two different cases: 1. make something that works just to recognise if someone is human or not, 2. to obtain data on unknown cases (i.e. help in computer vision tasks)

The first case needs labels, otherwise you can't say if people are right or wrong [i.e. you need to have predictions for areas where you already know where buildings are]. In the second case we obtain data from the data.

What is needed

sample data
little interface
test with a high number of people (volunteers)
to differentiate human/not you see how many of the volunteers are succesful
… you need an answer from many people (30/40?) so that you can make sense statistically (Google has huuuge numbers)
test with bots??
if it's going to be used only within OSM, then there wouldn't be many people that want to crack it, if compared to widely used captchas
programming language: javascript (because of the frontend)
server for storing images and results (this can be done in whatever language, python)
a way to integrate with the existing OSM logging flow

Would Stuart want to take part in this project?

Acknowledging that from December to March he is in paternity leave, he can help to build a prototype interface in one or two afternoons of work.
It would temporarily sit on Github.
Anna to provide him with imagery for this [~100 images for each of the four TP, TF, TN, FN categories, with a couple of buildings per each tile].
… catch-up in person at November team-meeting in Glasgow (Stuart lives in Edi).

Sept 27, 2024 (Zoom)

Anna, Guillaume

Discussion of general idea and mockup concept.

TODO list

get in touch with ~~Omran (~~ Benni ?) to discuss the factibility and check interest
investigate and learn (or find someone that can learn/knows) the tech to do this - Anna, in progress
start github repo
make a little plan with a graph of the idea

Sept 16, 2024 (London)

Anna, Guillaume

Guillaume launched ~~a name and~~ an idea for the name and logo .
First draft:

NotesLDN_1
NotesLDN_2

Useful links @

https://en.wikipedia.org/wiki/CAPTCHA

https://github.com/altcha-org/altcha Open source CAPTCHA "GDPR compliant, self-hosted CAPTCHA alternative with PoW mechanism and advanced anti-spam filter." but (proudly!) doesn't implement puzzle solving.

References, literature

Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2
Andrew Searles, Renascence Tarafder Prapty, Gene Tsudik
https://arxiv.org/abs/2311.10911
…
(An article covering that paper)
https://boingboing.net/2025/02/07/recaptcha-819-million-hours-of-wasted-human-time-and-billions-of-dollars-google-profit.html

ReMAPTCHA: A Map-based Anti-Spam Method that Helps to Correct OpenStreetMap Stefan KELLER
University of Applied Sciences, Rapperswil / Switzerland · sfkeller@hsr.ch
GI Forum 2014
https://gispoint.de/fileadmin/user_upload/paper_gis_open/GI_Forum_2014/537545020.pdf