Home » Research


Project MapLemon

Project MapLemon is a corpus for stylometric demographic identification of 54,000+ words across 346 participants originally created to obtain a baseline corpus for linguistic variation among North American English speakers. The corpus contains responses from 30 linguistic backgrounds, and 40 US states and 6+ Canadian provinces. Project MapLemon has innovated a new method for data collection for linguistic variants in the natural, digital written word. MapLemon utilizes a hand-drawn map and asking participants to give directions via this map, as well as asking participants for a recipe for lemonade. In addition to its novel collection methods, MapLemon contains responses from 212 transgender and non-binary people; analysis of which has shown that transgender people write most similarly to their sex assigned at birth, then to their gender, and are dissimilar to other opposite-sex transgender people in their writing. Furthermore, the analysis suggests that Non-Binary people are their own gender category and cannot be classed with any other gender.

An example poster, presented at Text as Data ’22, can be seen below. The poster is protected under Creative Commons Attribution Non-Commercial Share-Alike License.

MapLemon data is available on GitHub here: https://github.com/tdmmct/maplemon

Know someone who wants to take the survey? Click here for the link! NOTE: Responses to this survey are unpaid. Survey responses are anonymous.

MapLemon is supported by the EViL Lab at Duquesne University, and the Provost’s Digital Innovation Grant. Information about this grant is available here: Project MapLemon.

Recent slides are available at this link (click here), further explaining how MapLemon works. Data is not up to date but results are the same, created Summer ’23.

Want to know more about JGAAP? Click here to visit the JGAAP GitHub repository.