This article is mostly adapted from a Mastodon thread I recently made about the issue. If you follow me on Mastodon, this will mostly be a repeat. Anyways without further ado, let’s talk about survey design and why mistakes from statisticians are more common than they should be.
Survey design is the study of making surveys and how they can be used to draw conclusions and gather data. Getting this important is paramount: A poor survey design could make the data quality so trash that you have to restart or pivot to not how to do a survey design. This is why you edit and revise and sometimes have other critique before pushing out a survey. However, lots of common major mistakes are made and this is not only costly with having to revise and fix issues, it is also damaging and erasing.
Since my training is in mathematics, my focus will primarily be on statisticians with significant training in mathematics. It is no secret that mathematics and statistics leans heavily to cisgender white guys. This means that most of the time that the people that are statisticians making a survey are cisgender white guys. This is not a problem in of itself, the problem arises from homogeneity of thought, being oblivious to their own biases, and being hostile towards those that attempt to correct this.
For example, you might have a survey that asks about gender. The categories might be Male and Female. Because of the fact that no one on the team is even aware of nonbinary people, this survey might go through with no problem. This is one way that common mistakes on survey design are not fixed. All of the statistician themselves are oblivious to their own biases and so it does not get corrected until either a participant leaves feedback stating this, someone nonbinary catches this and tells them about it, or in rare cases that someone realizes that a big mistake was made without feedback. You do not know what you do not know, the unknown unknowns can absolutely devastate a survey. This also contributes directly to erasure: If you do not have an option as to how a participant identifies, their identity is erased. This is especially dangerous in some cases, but I am leaving the discussion of this to another part in the series.
Another common mistake is not knowing the psychology of taking a survey. Most statisticians do not have much psychological knowledge. Now you can absolutely take courses on the psychology and sociology of survey design. Here’s the problem: Those classes are not only geared towards psychologists and sociologists and so will not take into account the skill set of someone mathematically trained in statistics, but also that most math people avoid both psychology and sociology courses. It’s partially why they chose to study mathematics or statistics. They want a subject about rigor and numbers and proofs, not about people and society. Obviously there are exceptions like myself, but the bulk of statisticians are more interested in data more so than psychology. Now, not being interested in psychology is not something that I am going to judge a person for. The problem is that psychology is very important in a survey.
What questions do you put first and last? Do you talk about sensitive topics? Are your survey takers safe to take the survey? How do you word the questions? Honestly, this part needs its own part, but the point of the matter is that taking a survey is highly psychological. Pretending it doesn’t could kill your own survey. If participants see poor wording, long surveys, or topics that they are sensitive to, be prepared to have a high incomplete rate. Statisticians do get some training on this, but the lack of psychological knowledge shows by how frequent I see long surveys that give no indication of an estimated time. That is not even to mention that you must tailor this to each method you use for your survey design. If you do in person and online, each one should incorporate what is appropriately psychologically and sociologically to consider. For a sociological example, you generally want to avoid doing surveys during holidays for your targeted population because people will respond less. So, you could avoid this in person by doing it earlier or later and online by waiting to publish the survey or doing it before the holiday. Remember how I said that most statisticians are cisgender white guys? Yeah, they have a habit of not knowing when many holidays are, so this is one way how homogeneity of thought and obliviousness can hurt your survey design.
This homogeneity of thought manifests itself first off in the training of mathematics. Most of the statisticians that I am referring to have training almost exclusively in mathematics or computer science. Not a problem in of itself, but if you are doing a survey about a topic outside of mathematics or computer science, this homogeneity of thought within a space can absolutely cause grave harm. For example, if you’re doing a survey on poverty, your lack of knowledge on sociology will unintentionally bias you. You might not be aware of the nuance of poverty and as a result you might not capture the data that you wanted to capture. Someone could be above the poverty line, but still be in poverty because the poverty line is set too low. Or you could fail to account for systemic poverty and only discuss personal poverty when you meant to capture both. Or you could fail to ask questions about poverty that are extremely relevant because you do not have the training to realize that it is a question you should ask. This is why statisticians work with domain experts and are frequently domain experts themselves to mitigate this partially. Of course, asking statisticians to be omniscient is not only laughable and extremely unrealistic, but no survey itself will ever be perfect. There will always be limitations you have to account for and what works for one population may not work for another. What matters is taking those limitations in strive, doing better and letting people know in the future so that people can make better surveys.
Well, how do you tell statisticians about the limitations of a study? Some will get defensive, others will accept with grace. Depends on the person as to how you approach this. However, one thing should be noted: Most statisticians are cisgender white guys. Cisgender white dudes are generally bad at taking criticism from a minoritized group. To give a real life example of this, there was a person designing a study and one of the questions was on race. This study was intended to be used internationally. For race as one of the options, he put African American, but did not do this for any other race. I pointed out for an international study that he used an American exclusive term and that he should change that. He went off, accusing me of being racist towards him. I was not angry with him or calling him racist, just pointing out that it was something to be fixed. Black people outside of the US are not going to identify as African American because the vast majority, save for some dual citizenship, immigrants, and other situations I did not describe are not American. This is one reason why so many surveys are not inclusive: Many statisticians do not take genuine feedback well if you are perceived to be in a lesser social status than they are. For cisgender white guys in particular, this reduced tolerance to criticism and homogeneity of thought by being surrounded by similar people causes them to think their survey is good to go when it is not.
This is how many problematic surveys continue to exist. This is why it frequently happens that it takes blow back for things to get fixed rather than being fixed at the planning stages. Thankfully, this is slowly changing and inclusion is being incorporated better in surveys. There a number of other things that I did not touch upon this post that I hope to in later posts.
Stick around for part 2 for Psychology, Ethics, Inclusion, and the dangerous erasure many surveys do! For the future, if you want to see all of my posts on survey design, click the category Survey Design Series to see them all!
Leave a Reply