Project MapLemon is a corpus for stylometric demographic identification of 21,000+ words across 185 participants originally created to obtain a baseline corpus for linguistic variation among North American English speakers. The corpus contains responses from 10+ linguistic backgrounds, and 40+ US states and Canadian provinces. Project MapLemon has innovated a new method for data collection for linguistic variants in the natural, digital written word. MapLemon utilizes a hand-drawn map and asking participants to give directions via this map, as well as asking participants for a recipe for lemonade. In addition to its novel collection methods, MapLemon contains responses from 91 transgender and non-binary people; analysis of which has shown that transgender people write most similarly to their gender rather than their sex assigned at birth, and furthermore that there may be in existence a “trans accent”. Currently, Project MapLemon is seeking to expand the corpus, particularly looking for Male-to-Female (MTF) transgender respondents; as well, the Project is looking to analyze responses from non-binary individuals in a way that prevents binarism.
A recent example poster can be seen below. The poster is protected under Creative Commons Attribution Non-Commercial Share-Alike License.
Map Lemon data will be available to the public soon.
Map Lemon is supported by the EViL Lab at Duquesne University, and the Provost’s Digital Innovation Grant. Information about this grant is available here: Project Map Lemon.