📍 Mapping dialect variation among US cities in QGIS

This is a write-up of my journey to replicate an assignment for my linguistics class, in QGIS. I am new to GIS, if you see anything I could improve please let me know! I should make it clear that I am new to this software, and these instructions may not be best practice.

Installing QGIS

You can download the latest version of QGIS here. For this project I'm using version 3.4.6.

Preparing the data file

I'm using data for this project that I stole from my class, so I can't publish it here. It is a table of survey responses, with each row representing an individual survey respondent from a unique American city, and columns for each of a set of dialectical features those respondents demontrated. For QGIS to plot these responses on a map, the geographical locations of each of these cities needs to be determined, a process called geocoding. It appears to be relatively easy to geocode a data file to determine the locations of place names, such as the city names I have. I'd normally be down to try out a workflow like this, but I don't want to right now for two reasons:

  1. Doing so would be using plugins, which I don't my first time using QGIS be complicated by integrating plugins. I would love to learn about them in the future, but part of my goal right now is to get comfortable with QGIS
  2. This is not my data, and it would be very uncool to send it to any geocoding service without permission.

...so, I got each city's coordinates manually, using WolframAlpha searches. Admittedly a lazy/boring solution, but with some good music it wasn't so bad.

It looks like QGIS can import any filetype under the sun, but for simplicity's sake I exported the new data table as a comma separated values (.csv) file. To make the import as smooth as possible, separate the X and Y coordinates into different columns and label them "X Coord" and "Y Coord"

Importing data file

  1. Create a new project by either hitting CTRL+N or navigating to Project > New.
  2. The view box looks really empty and sad, so head down to the bottom of the screen and type "world" into the coordinate box, and then hit enter. There should now be a simple outline of the world on the screen.
  3. Hit CTRL+L or hit one of the leftmost toolbar icons to open the data source manager
  4. Click on the ellipsis in the upper right-hand corner of the popup window, and select the .csv file
  5. Type a name in the "Layer name" field, like "Survey Data" or something
  6. Review the import settings. Make sure "X field" and "Y field" are referring to the correct columns, and also make sure it's treating the first few lines of your file correctly (in my case, it guessed correctly that the first row was the column headers)
  7. Click "Add" and "Close" and confirm that there are now dots on your screen in the right places!

In the bottom right of your screen, you should now have a Layer for the survey data, labeled with the name you chose. As we add layers, this menu will be helpful in editing or hiding them.

Creating filters for each feature

Now we need to show QGIS what differences in the data it needs to see. I'm interested in learning a more efficient workflow for this, but the following steps were successful:

  1. Right-click on the survey data layer, and click on Open Attribute Table
  2. Now hit CTRL+F, OR click on "Select/filter features using form" in the top menu
  3. You should now be able to type in a search term next to one of your data table's columns. For our first search, let's choose a field with one possible value. The Feature 2 column in my data table represents the respondents who demonstrated monophthongization, so I typed monophthongization into the box next to that column. When you're done typing your filter, click "Filter features" in the bottom right-hand corner.
  4. There should now be a "Filter Expression" in a text box at the bottom of the window. Copy it!
    Mine looks like this: ("Feature 2" ILIKE '%monophthongization%')
    You will use this expression when creating a rule for this layer's symbols. Head down to the next step ⬇️

Working with rule-based symbols

  1. Right-click on the survey data layer, and click on properties.
  2. Navigate to "symbology," then select "Rule-based" from the top drop-down menu.
  3. We now want to create rules for how QGIS should decide what symbol to show for each city. To do this we need to create a rule that filters for each feature surveyed, and displays an icon for cities that have that feature. The good news: you already have this filter rule copied to your clipboard! Click "Add rule" (green plus sign in bottom right)
  4. Paste the filter rule into "Filter" While you're here, type in a name for this rule, and maybe pick a unique color for it. You can fine-tune these things later.
  5. Click "OK" and then while still in the symbology menu, drag your most recent rule above any other rules (you should make another one if there is none) Apply your changes and exit the properties popup. You should now see a distinct color on the map that represents the filter you just created!

Now repeat these instructions for the other features in your data. You should be able to adapt the same filter expression for the rest of your rules, editing it to represent the right column/value. If two or more features overlap, edit one of the symbols to be larger than the other, and then drag it to above the smaller symbol in the symbology popup. This way they will both be visible, the bigger one behind the smaller one. You can also control this in the layers menu.

On using QGIS in linguistics class

The paper assignment I did in class was tedious, because we had to manually plot the data on a paper map. As I've shown here, a GIS method can be just as tedious! For GIS to be incorporated into an assignment like this, I'd want to iron out the kinks with importing data and filtering symbols. I suggest providing students with a QGIS template file that already has neat symbology, and leave it to them to properly import data and match it to the rules. This would leave more time for more advanced data visualization, to show off current linguistics research as well as GIS magic. For example, students could download a GIS file with heatmap layers of the same dialect features, allowing them to see more nuanced distributions and see how their works fits into the larger data set.

Anyways, this was a fun side project and a great excuse to learn some QGIS!