This is a writing and reading website, and I am very deliberate to keep my postings and other content to those topics. To me, it would be a sort of bait-and-switch if I drew you into the site with promises of fantasy and science fiction, and then started using the site’s blog as a platform to espouse my views and perspectives on totally unrelated topics. You came here to read about starships and warlocks, not my political preferences or what brand of soap I prefer.
Despite that, what follows is an essay I’ve written on statistics, and their role in modern communication. Since it’s not really endorsing a particular platform, and it is related to communication (which is what this site is all about, and eventually there will be reviews for non-fiction works on the site), I don’t think this strays too far from my base promise to you as readers. In my non-author roles of scientist and engineer, I deal with huge amounts of data, and I have found that one of my personal pet-peeves is misinformation. Hopefully, this essay will help you be a little better informed, or at least to more readily identify when you’re being misled.
If this isn’t something that interests you, then you are under no obligation to read it. Head on over to one of our stories, and read that, instead. Or simply wait until the next post comes out. For now, though, I hope that you’ll indulge me for this piece on statistics in modern communication.
Humanity’s fascination with numbers can be traced back to the Sumerians, and the ancient language, cuneiform. In some of the species’ earliest cities, written communication was invented as a means of keeping track of numbers. Census data, to be specific, which was used to levy taxes on the populace. Aside from showing that both writing, and math, were developed in order to facilitate taxation, this is arguably the start of humanity’s fascination with using numbers to explain the world around it. As we developed new mathematics and new techniques for recording information, the unique capabilities of statistics were leverages for wider ranging applications. Geometry, for instance, which oddly enough has the same root word as geography or geology, geo, which means earth, is called geometry because the Egyptians invented it to measure out parcels of land.
Early civilizations developed numbers as they found need for them; this is why many civilizations had no concept of zero for a very long time – who needs to count nothing of something? Navigation was one of the primary drivers of advances in mathematics, requiring advanced time-keeping, accurate cartography, and numerous other, number-based skills. None of these, however, were really working with statistics. In fact, statistics didn’t become really prevalent as a branch of mathematics until probably the early nineteenth century (it’s notoriously difficult to pin exact dates to these things), and much of the math that modern statistics leverages wasn’t developed until the seventeenth century.
These early probability calculations were used to analyze games of chance, and to develop insurance policies (so far we’ve connected the development of statistics with taxes and insurance, everyone’s favorite topics), and later to analyze demographic trends and census data. This is not terribly surprising, given a decent knowledge of the time periods involved; both time frames are of the late Renaissance, early Enlightenment time period, wherein the idea of empirically understanding the natural world through numbers became popular and possible for the first time. Not surprisingly, statistics became a favorite tool of scientists, government officials, and insurance agents.
All of these efforts, though, were limited by a fundamental fact: data was hard to generate and track. Statistics, therefore, were leveraged mostly only where they could be most useful/profitable. That changed radically in the twentieth century, with the advent of computers, especially the personal computer. Indeed, some studies have suggested that Microsoft was so successful in its early days primarily because of the capabilities of its spreadsheet tool, Excel. Computers didn’t merely enable to easy and rapid aggregation and analysis of large data sets; they made it easier to create them, too. With computers came sensors, digital memory, automation, and other tools that allowed for the simple generation of huge, technical data sets. With modern technology, I routinely generate and analyze hundreds of thousands of data points with ease.
Easy to generate, easy to use data has led to a meteoric rise in the influence that statistics have on just about everything. In a repudiation of logical fallacies, human inadequacies, emotional reasoning, and opinion flaunting, statistics are wielded as the unblockable weapon, the sword that no shield can stop. After all, if the numbers back up a position, and the numbers were objectively generated and scientifically analyzes, they can’t possibly be wrong. Consumer products, sports, business performance, political platforms: these are just a few of the fields that have taken to using statistics as the final answer. The numbers don’t lie, after all. The numbers are going to say what they’re going to say. This is the dominance of Big Data.
In some respects, that’s a good thing. Statistics do make for a more informed populace. They enable science, make models more accurate, systems more useful, and decisions more objective. Leveraged as a means of analysis, statistics are a powerful tool that enables all kinds of advantages. With Big Data, we can come to an understanding of our world that is more thorough, more detailed, and more comprehensive than anything we could have imagined achieving a century ago.
There is, however, a dark side. Statistics are a tool, just one of many, and they are a tool for analysis, not argument. That’s where the danger lies: using statistics as a weapon of argument. Whatever is being argued, leveraging statistics is fundamentally misleading. By their very nature, statistics are a reduction technique. They take a huge set of data that a human can’t easily interpret, and boil it down to a few numbers with which we can more easily interact. That’s useful if you understand what went into the original data set, how those numbers came to be, and the process by which the statistics were derived, but that’s information that isn’t readily available to the average information consumer.
When it comes to statistics, their utility, clarity, or obfuscating tendencies are largely wrapped up in context. The framing of a statistic can provide it with radically different meanings. A favorite technique for framing a statistic is actually to not frame it at all, allowing people to draw their own conclusions. A study might conclude “74% of people with this background and in that condition will benefit from a specific product under a specific, controlled set of circumstances.” What might be presented at large is simply “74% of people will benefit from the product.” True, but without the context, the presumption becomes that it is truly referring to people comprehensively.
Scale is another problem. Numbers in and of themselves are largely meaningless; despite some metaphysical arguments to the contrary, mathematics are not fundamental to nature. It is only be comparing numbers that useful, relevant meaning can be derived. Showing a graph with a sharp uptick will be dramatic and drive home a desired point, but that spike could be over a tiny time span, or perhaps it’s really just a massively magnified view of a nanoscale change. Even if the graphs are properly labeled, it will do little to mitigate the visceral response to the initial visual.
Statistics aren’t going away. Indeed, as computing and sensors continue to become more powerful and more accessible, they will likely only become more prevalent. It is imperative, therefore, that we be informed consumers of data. Pay attention to contexts, look for labels and graphs and possible, alternative interpretations. Mostly, question the information with which you are presented. Those numbers have been crafted to serve an agenda, and so understanding statistics becomes a matter of understanding people. People will use statistics to further their own agendas, and it’s not as simple as merely using only favorable statistics. Two opposing groups might present the same statistic in a different way, and it will serve them equally well.
Although it is worth questioning statistics in any context, it is especially pertinent when they are being used in argument. Whether politics, advertising, or in a business meeting, beware of someone trying to back their “side” up with numbers. Even if it is a good-faith attempt to harness data to make a point, there is no way to be certain that there are no nefarious motives, and people will naturally tend to present data that is most favorable to them and their needs and desires.
Aside from questioning the numbers with which we are presented, there are some circumstances where it is worth performing your own analysis. If something is especially significant to you, it might well be worth taking the time to find the original data sets, and analyze them yourself. That way, you will be intimately familiar with how your numbers come to be. With the profusion of data, this is far from tenable under all circumstances, but is worthwhile in some circumstances.