Discover more from De Civitate
Big Penguin Polling Looks Real To Me
It'd be a lotta work for a grift.
This is a shockingly long post about the internal operations of an arguably-fake polling firm. Consider this a one-time replacement for my Playing PredictIt feature, which is currently suspended. If “pollster reviewing” isn’t your cup of tea, skip it, with my blessing.
Big Penguin Polling is a loudmouthed, Trump-right Twitter account launched in 2022. BPP claims to run polls. Its presentation is… unconvincing. Here is their website:
I’m not going to pick on Weebly, because I know what it’s like to accomplish something (like run a poll) and then struggle to pull together the money and professional resources to showcase your accomplishments in a professional way. But the capitalization is so bad it’s like a deliberate provocation. (He capitalized “personal Bias”! The throat constricts!)
De Civitate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
BPP’s polling results look crazy good for Republicans, but their website offers no detail data. No crosstabs, nothing, not even a “date of poll.” Here’s a screenshot of their “recent polls” page from 8 September 2022:
For context, as of 8 September, the 538 polling average, which I consider the best in the world, had Fetterman beating Oz by 8.0% and Shapiro leading Mastriano by 6.9%. So BPP’s results were, respectively, 8.6% and 9.1% (!) more favorable to Republican candidates than the average. Not a single poll in the 538 database has ever shown Mastriano in the lead, and just a handful have even put the race within single digits.
This is pretty typical of BPP, which keeps showing a red wave rising, even as other polls have seen signs of a red wave dissipate over the summer.
That red wave just happens to dovetail perfectly with BPP’s political takes, which range from ra-ra-Republican:
Oh, and, yeah, their current Twitter avatar is a customized hot anime waifu wearing suspiciously short shorts, thank you for asking.
After a recent doxxing by some sloppy2 jerk at the Philadelphia Inquirer, it turned out Big Penguin is run by some rando 24-year-old from North Carolina who has no background in public polling and no meaningful Internet footprint. He has (reportedly) two silent partners, whose names are not known to the public. (One of them appears to use the alias (?) “Elizabeth Strawyberry,” but that trail ran cold fast for me.)
In short, BPP is not an outfit that inspires confidence. It looks for all the world like one of many grifts in Trumpworld: tell Trump fans on Twitter (the type who follow @barnes_law) what they want to hear, then make money through wishful-thinking fans via Patreon.
In August, I registered my skepticism:
I kept following BPP on Twitter, but not closely.
What We Know About Big Penguin’s Operations
This methodology is pretty plausible, given what we know about Big Penguin: it is clunky, it is time-intensive, it involves writing 50 branches of a survey when you’re only actually asking the end users four questions, it’s mildly vulnerable to hacking… but it would work.3 I even put together a little fake survey on MeetingPulse myself to make sure it could be done.
I frankly do not believe BPP’s claim in the video to be doing half of these as live caller surveys. Live caller surveys have a response rate of roughly 6%. BPP’s average survey claims roughly 1200 total responses. If half of those responses are live phone interviews—meaning 600 successful phone interviews—that implies that that the good folks at BPP (I understand there are three of them) are calling 10,000 people on the phone over the course of 4-5 day surveys. I’ve done political phone banking, with shorter messages than these polls. It’s a long slog. Even if live-caller polling were their full-time jobs, there literally aren’t enough hours in the day, unless you hire a call center. If BPP is hiring a call center, they aren’t talking about it, and there’s no reason to believe they can afford one.
However, if we assume, instead, that the overwhelming majority of BPP’s surveys are conducted via SMS text message, supplemented with a handful of successful live phone interviews… that’s much more plausible. Texts can be automated, they are cheaper than a call center, and response rates on texts are far higher than live phone interviews. BPP’s survey instrument is short, which should further boost usable response rate. They could probably pay for their full survey of Arizona for... ballparking it... <$500? BPP's video methodology presenter doesn’t seem to quite understand how CallMultiplier works,4 but we can perhaps chalk that up to their video presenter not being the same BPP staffer as the guy who actually runs the surveys, and misunderstanding the process slightly.
This is about what we would expect from a polling company with no professional experience and no budget. In fact, this is probably pretty close to what I would do if I tried to start a polling company. Of course, even with this shoestring operation, we’re still looking at $200/month for MeetingPulse + $500/survey for CallMultiplier, and they seem to be running at least a survey per week… which is certainly more than I could afford at age 24, but I got married that year and had a kid the year after. It’s not outrageous for a small group of interested, uncommitted people to come up with a few thousand dollars per month between them.
Better yet: Big Penguin actually started releasing crosstabs! (but only on Twitter, for some reason? Like, guys, you have a website. Nobody believes anything on Twitter. Put it on your website!) Here are all the crosstabs BPP has released—at least, the ones I could find scrolling their Twitter feed:
Here is how these results compare7 to the 538 average as of the dates each survey concluded:
These results are strongly inclined to the GOP side. In every poll, Big Penguin showed a better result for the GOP than the average. In many cases, the inclination was strong (>7 points).
That doesn’t necessarily mean BPP is wrong. Plenty of reasonably good pollsters break away from the herd. In an era of increasing poll herding, results well outside the consensus can actually (sometimes) be evidence of accuracy. On the other hand, BPP wouldn’t be the first fake pollster publishing scam results to advance its own agenda!
Is Big Penguin Polling Fake?
Let’s be clear about what it means to be a “fake poll.” A fake poll is not a poll that gets the final results wrong, even badly wrong. (Everyone does that sometimes.) It is not a poll with a poor methodology. (You can weight for that.) It is not a poll that applied statistical weights in an arguably stupid way. (You can debate that.) It is not even a poll that made a mathematical mistake. (You can correct that.)
A fake poll is a poll that literally does not exist. A fake poll claims that it interviewed a bunch of voters for their opinions, but either (a) those interviews never happened, or (b) the pollster fraudulently changed the interview results. A fake poll reports faked data.
1. I Find No Evidence of Fakery
Big Penguin Polling sets off a ton of red flags for me. If I’ve done my job right, I’ve shown you all those red flags.
Those red flags should not be silenced. Some pollsters have been proved fake. In the Research 2000 scandal of 2010, the pollster used by Daily Kos (at considerable expense) was exposed as having faked much of his data (maybe all?). Despite some implausible results, R2K continued to be tolerated until a trio of statisticians cracked its belly open and found it full of lies. Those statisticians found reliable patterns of fakery in R2K’s polling—in essence, numbers that looked plausible to the naked eye, but which stopped making sense when you lined them up next to each other and compared their properties to the properties they should have happened according to the laws of statistics.
I am not a statistician and did not check everything imaginable in Big Penguin’s polls. I did not even check all the basic things, like verifying that the racial demographics for each state roughly match the population of registered voters. (Do 19% of Arizonan likely voters really identify as Black, as BPP found?) BPP does not do polling with enough frequency for us to identify “missing-zero errors,” either. But what I did check, checked out. There’s no apparent odd/even pattern in the male/female columns. There’s a reasonable amount of “messiness” in crosstabs (which is good). The numbers I checked all summed up correctly. So if Big Penguin is faking these crosstabs, they are putting a lot of work into developing them.
“Well, maybe Big Penguin learned from the Research 2000 scandal and specifically designed their fake data to avoid that kind of detection?” Yeah, I guess, theoretically possible, but remember that the person who runs Big Penguin was eleven years old during the R2K scandal, an off-season pollster scandal that never even reached national headlines. Also, like, c’mon:
There is a kind of person out there who learns all the history and thinks six layers deep in order to perpetrate the perfect fraud. That person does not Capitalize every Other Letter like A Founding Father.
And why would they go to such lengths, anyway?
2. I Don’t See The Benefit To Faking
We all know why Research 2000 faked its results (and in great detail): it made a lot of money off Daily Kos. We all know why political grifters of all stripes (from the alt-Right to the Lincoln Project) fake their results: because they can make a lot of money off rubes, and they don’t even have to put a lot of details in (R2K probably made off with more profit, though).
Where’s Big Penguin’s payoff? Their Patreon has disappeared. Their Twitter following is pretty darned small (2.3k followers today). They never ask for donations, at least not that I’ve encountered. If it’s all a scam to move PredictIt betting markets, they’d be working a lot harder than they are to get their polls into the aggregators that drive those markets—and they’d have stopped working so hard when the CFTC cracked down on PredictIt. Maybe there’s a secret Discord where the Big Penguins fundraise? Still, this is not typical grifter behavior!
3. They Anticipated Trafalgar in Michigan
Trafalgar Group is an increasingly respected polling firm that leans red-ward and which has had a few good cycles since it launched. Their mixed methodology, short survey instrument, and pugnacious attitude resemble Big Penguin’s across the board. Trafalgar's results often diverge from the polling average, but often look similar to Big Penguin's. Since they’re doing similar things, with similar attitudes, and getting similar results, this would tend to suggest that both pollsters are tapping into some genuine vein of data that other pollsters aren’t seeing.8
Usually, Trafalgar rolls out a poll, and then Big Penguin rolls out a poll a short time later with a similar result. A committed cynic could attribute this to copying, or at least herding. That theory would make it hard to explain the large gaps between Trafalgar and BPP in Pennsylvania, but it would explain several other results.
However, in a couple of cases, Big Penguin has successfully posted results very similar to Trafalgar’s—before Trafalgar’s poll was conducted. In Michigan, Big Penguin’s poll found Whitmer winning by 4.4 points. (The polling average had Whitmer ahead by almost 13 points, so this was way outside the norm.) A matter of days later, Trafalgar conducted and released its own poll of Michigan. Result: Whitmer +4.1. Big Penguin not only released a poll it could not possibly have copied from Trafalgar; it nearly bulls-eyed Trafalgar’s result.
Likewise, in Arizona’s gubernatorial race, Big Penguin found Lake winning by 5.2 a few days before Trafalgar found her winning by 0.7. These results aren’t nearly as close to each other, but the polling average had Hobbs winning, and no published poll since the primary had shown Lake in the lead… until Big Penguin and Trafalgar did just that, in rapid succession. RMG Research then also released a poll on August 26th (again, after BPP released theirs) which showed Lake with a 2-point lead, lending their out-there finding more support.
So, twice, Big Penguin has gone out on what can only be called a polling limb, only to have that limb subsequently confirmed by another pollster. Either Big Penguin is pretty lucky, or they’re actually conducting polls!
(My memory of Research 2000, for what it’s worth, is that R2K very rarely went out on a limb, and, when it did, it never seemed to get any confirmation. Of course, that’s because they actually were faking their data.)
4. Above All, Big Penguin has the Benefit of the Doubt
You should believe that polls are real until proved fake. This is the most statistically reliable method for building a successful election model. It is not too hard to prove that a fake poll is fake, but it is quite hard to prove that a real poll is real (at least, without disclosing detailed internal data that pollsters generally don’t disclose). Moreover, herding aside, fake polls are rare. If your heuristic is to distrust all polls until proved “real,” you will miss out on important data and your model will suffer for it. In 2016, if you wanted to ignore the brand-new Trafalgar Group’s shockingly red polling (because they hadn’t proved their legitimacy yet), you would have missed out on one of the only signals of Trump’s imminent victory.
That’s why FiveThirtyEight has always taken what it calls an “inclusive attitude” toward polls, saying, “FiveThirtyEight has traditionally accepted any poll from any firm so long as we don’t have evidence the poll or pollster is an outright fake.” This openness has been one of FiveThirtyEight’s strengths, and why it constantly trounces, say, RealClearPolitics’ herky-jerky, closed-off polling average. FiveThirtyEight wants to know all things about a race, and every single poll adds something to our knowledge about a race. None can be discarded in the pursuit of knowledge, so 538 takes all the polls it can find.
Sure, FiveThirtyEight weights reputable, established, methodologically proven pollsters more heavily, as it should, but they include everything, taking whatever epistemological leverage they can from every data point. That’s why 538 always wins, and why De Civitate has been stanning for Nate Silver since we launched a decade ago.9
Now that we have fairly sophisticated evidence that Big Penguin Polling is either:
(a) conducting polls like it says or
(b) doing a fairly crazy amount of work to fake poll results for no obvious benefit
…I think Big Penguin has met the low burden needed for the “inclusive” standard, at least for polls where BPP has published crosstabs.
Someone could still come forward and show good reason for excluding BPP. Perhaps someone will! Perhaps someone has been holding back his critique of Big Penguin because its flaws seemed so obvious as to require no explanation! I may not be seeing all that I should!
However, until someone actually does show that, I think Big Penguin now belongs in polling aggregators and election forecasts—including FiveThirtyEight’s model.
Big Penguin Could be Real and Still Wrong
I’ve said this twice, but I want to say it once more, with feeling: accepting that Big Penguin is really doing polls out there doesn’t mean that its results are correct.
In the 2020 cycle, the average pollster found Joe Biden was winning Wisconsin by 8.4 points… and that was the average. ABC/Washington Post, a very fine pollster, found Biden was winning by 17 points! In fact, Biden won by about half a point. Their polls were all real, not fabricated out of whole cloth, yet they were way off.
Meanwhile, in 2018, Trafalgar (also a fine pollster) predicted that Kemp would win GA-GOV by 12.3. They were off by nearly 11 points: Kemp won by 1.4. Pollsters are all trying their best, but, sometimes, some of them just have their fingers on the wrong pulse. Big Penguin may be doing its very best and still find itself off-base by, I don’t know, 7 points when election night arrives.
This could be exacerbated by Big Penguin’s relative inexperience with polling, which may make them more error-prone than, say, Quinnipiac. They don’t seem to be weighting anything at all, which is an acceptable but somewhat daring methodological choice, and I worry about things like their randomizer and the extent to which they intervene in polling buckets that look fishy to them (as mentioned in their methodology video). I can’t help noticing that BPP consistently shows Republican candidates winning insanely large chunks of the Black and Latino votes, while the White vote looks about “right.” Did the GOP finally break through with Black people and BPP is the only one who sees it, or is this just a polling malfunction?
Also, the fact that Big Penguin’s polls seem to be real doesn’t mean we have to agree with anything else Big Penguin says or thinks. I mean, imagine having to listen to the insufferable @DataProgress for anything but their topline results! They speak in the register of my social class, but, counterpoint: ugh. Or, try to conceive of a world where appropriately weighting Morning Consult results meant we had to actually believe what we read in Politico. No thank you! Point is, polling data can add to our understanding of the world even if the polls’ authors are fairly out to lunch (including on matters of polling!).
Since Big Penguin appears to be collecting real polling data, we should factor that into our models of the 2022 midterm elections. You should treat a Big Penguin poll the same way you (or, more intelligently, the FiveThirtyEight model) would treat a Trafalgar poll of similar impact, discounted a little for BPP’s lack of track record and its other red flags.
A solid Trafalgar poll can bump a lightly-polled race a few points off the fundamentals. You can see how this looks in FiveThirtyEight’s forecasts for CA-21 and GA-2, where the poll’s appearance visibly shifted the projected outcome overnight. (Switch from the Deluxe model to the Lite model to see what a huge impact a poll has in that model!)
Big Penguin’s results therefore shouldn’t fundamentally alter your view of the 2022 midterms. I stand by my statement a couple weeks ago that anything could happen. But Big Penguin’s findings might nudge the likelihood of an overall Republican victory up by, say, 1%.10 That's worth noting!
Now that it’s finished, I find I’m not sure what I got out of writing this post. If it backfires and Big Penguin Polling does turn out to be fake, I’ll look like an idiot. If it’s correct and BPP is worth including in models, I’ll look smart, but nobody outside De Civitate’s readership will know it. (Nate Silver does not read my blog, alas, so this post will not actually achieve its goal of getting BPP into the 538 database.) BPP itself is probably neither thrilled with all my negative comments about different aspects of their operations, nor particularly enamored of the idea of being included in the models with all those other polls BPP calls “fake.”
Oh, well. I get one really absurd deep-dive into a polling issue per election cycle (some of them wrong). This is 2022’s. Hopefully you, at least, enjoyed the ride. I’ll hopefully be back on Election Day Eve with my
bisemiannual election preview, and then you’re safe from my weird poll stuff until 2024.
EDITOR’S NOTE: I did not contact Big Penguin Polling during the writing of this article. However, should they choose to reply to this article in any way, I would update this article with their reply.
UPDATE 21 October 2022: Big Penguin wrote to me after publication and we had a productive exchange, which I’ll share here (I’ve edited my responses for brevity, because my responses aren’t the interesting part here):
@BigPenguinPolls: Just wanted to say I read your article and have no problems with the negativity, I very much welcome more constructive criticism than just "Lol fake" and nothing more. I simply hope in 3 weeks we will be vindicated. Cheers Friend.
De Civitate: Hey, thanks! I'm grateful for your understanding.
What *is* BPP's response rate (on text vs on phone), and how many people at BPP are actually dialing phones every day? Thanks again.
@BigPenguinPolls: So our goal is to at least get 5% of phone interviews; with 2 of us calling all day from when we start at about 5 am to 12pm. Our goal was to get 50% if we can, but the rate people answer thr phone is much lower than you'd think along with those who immediately hang up.
We usually like about 2000 responses per survey, with exceptions like PA, before we go through and screen out suspicious answers such as a someone who says their pro-life but then supports abortion up to the third trimester(as a example)
So for our last week before our final generic ballot; were doing 3 states so we hope to have a total of 300 phone responses by the end for all 3 races combined, or around 60 responses a day via phone/30 a day by each caller.
De Civitate: …so [on a typical single poll] that's 413 phone calls per day, divided among two staff members, so about 207 phone calls per staffer per day? That's a lot of work, but very doable.
One thing that surprised me in that answer is that you screen out suspicious answers. My understanding is that most pollsters leave suspicious answers in, on the theory (1) they mostly cancel one another out, and (2) people are just fricking crazy and that may be their honest opinion. (Prior to 2016, I myself believed that every single person answering "Donald Trump" to GOP presidential polling surveys was 100% trolling the pollsters.) The prevalence of crazy results is so well-known that it even has a name: the Lizardman's Constant. So that is a methodological difference between your polls and, say, Marist, and perhaps a significant one. Could you say approximately what percentage of initial responses get screened out during the suspicion check? If it's 0.1%, NBD. If it's 10%, that could go a long way toward explaining the difference between your results and those of other firms.
@BigPenguinPolls: So we initially didn't screen out as many results as we did, simply taking them at face value. However, we were absolutely burned in California due to the Open Primary, and after going through the answers, we found several answers that should have ticked us off(Such as a Black Elderly Democrat saying they would vote for the MAGA Candidate, or Republicans with a disapproval of Biden voting Democrat.) Since then we've attempted to be more strict on our modeling, and we usually screen out around 5-6% of answers from suspicious answers.
END OF CONVERSATION (so far). Now for my comments on this:
First, it’s nice to have some phone stats confirmed. 5% of the sample is a lot more sensible than 50% of the sample. I’m more convinced than ever that BPP is actually out there doing data collection, not just faking their data.
On the other hand, the casual mention of filtering out “suspicious” results is very worrisome to me, because I know how very easy it is for a data collector to toss out valid responses. For example, BPP believes it’s inconsistent and incoherent for someone to identify as pro-life but also to support legal abortion through all nine months. I agree! But I also know from lots of other abortion polling done over the years11 that a non-negligible percentage of Americans believes exactly that: they think abortion is wrong (and so identify as pro-life) but that the State has no business regulating it (for one reason or another). I think filtering out those responses is a mistake. Likewise for the other replies: sure, an elderly black Democrat is unlikely to vote for Trump… but such people do exist, and in large enough numbers to make or break some elections. It’s very dangerous to a poll to not measure those people.
It’s even more dangerous when the poll doing this is run by people with strong political opinions of their own. Cognitive bias is a subtle and powerful opponent; it naturally inclines one to treat the other side with more suspicion than one’s own. If a machine screened out 6% of responses for being “suspicious,” it might well delete 2% actual fake responses, plus 2% of real Red voters and 2% of real Blue voters. Deleting real voters is a bad polling practice, but, if they cancel each other out, it probably (probably!) doesn’t make much difference to the final result of the poll. But if a human being with natural cognitive biases screens out 6% of responses for “suspiciousness,” it’s much more likely that he or she will delete maybe 1% actual fake responses, 1% of real Red voters, and 4% of real Blue voters. That creates a 3-point swing in the final result!
Pollsters struggle with these questions all the time, and come to different answers on them. But BPP’s practice is, to my limited knowledge, very unusual. Pollsters tend to argue about weights and so forth, but outright redacting responses based on personal judgment is not something I’ve ever heard of.
BPP could cure this problem by releasing two results for each of its polls: the redacted version (which it uses in its topline results) and the unredacted version, which includes all responses from all voters. We would need crosstabs for both, which I recognize is a bit of extra work. This would be similar to polls that release both their “Likely Voter” and “Registered Voter” results. I encourage BPP to consider doing this, because it would give a lot of valuable insight into how their redaction process may or may not be biasing (or perhaps, if they’re superhumanly good at it, unbiasing) their results.
I still think BPP is providing some useful information and I still think it’s worth paying attention to—and perhaps even aggregating—its polls. I also applaud BPP for its transparency and openness when I asked them about these things; they were not obliged to answer any of my questions, and I really enjoyed our discussion. Nevertheless, I suspect that BPP’s results will skew slightly (maybe as much as 3-4%, hopefully much less) in the direction of the Republicans due to the redaction, and readers should take that into account when reading their polls.
Okay, gotta run now.
De Civitate is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Not wrong, but very obnoxious about it.
The Inquirer concluded that BPP’s polls do not merely lack credibility (which, at the time, would have been fair), but actually called them “fake,” on woefully inadequate evidence.
In the video, BPP suggests that it sometimes asks additional questions, like number of children or marital status. Given their methodology, this would not be practical. Adding a question about children with four options would multiply their survey branches from 50 to 200. Adding another question about marital status with Married / Divorced / Never Married would take it to 600. This would involve an amount of setup work I find implausible.
Unsurprisingly, none of BPP’s actual crosstabs show evidence of asking these additional questions. They consistently ask only: Race, Sex, Age, Candidate choice (1-2 questions), and likelihood to vote. (Their likely voter screen simply ends the survey for anyone who doesn’t answer that they are “somewhat” or “very” likely to vote. They allowed “unsure” until Labor Day, a reasonably traditional cutoff.)
You’d want to use the pay-per-message rates, not (as the video claims) the unlimited rates, because your list changes every time you fire off the texts.
This poll was, according to BPP, commissioned by Patriots for Washington. I find no record of that group’s existence on an admittedly cursory search.
This poll was, according to BPP, commissioned by the Socialist Progressive Party of Virginia. I find no record of that party’s existence on an admittedly cursory search.
The Pennsylvania races in this chart use the August polls, not the July polls. This is because I didn’t discover the July crosstabs until after I made the charts. (They were hidden at the end of a trail of Twitter breadcrumbs; apparently Pollcruncher Elizabeth had a power outage that delayed the crosstabs for several days.)
…which doesn’t mean that this vein of data contains to the truth of the election, just that it taps into something that exists in the real world and isn’t fraud. We’ll find out on Election Night which veins of data were true!
And I’ve been reading him even longer! I discovered him just after the 2008 primaries ended. Can’t recall how. I grew so much that summer as a poll-reader.
Maybe less. That’s a guess. I don’t have the 538 model source code, although I hope Nate Silver has left instructions for it to be open-sourced upon his death, or election science will be set back decades.
I should include a citation for this but have to run out of the house in five minutes. If someone asks in the comments, I’ll look it up for you. :) But, really, the wild fluctuations in pro-choice/pro-life labeling from year to year, even as polling on underlying policies like heartbeat bans and partial-birth bans remains pretty constant, tells you all you need to know about the confused masses using both the pro-life and the pro-choice labels.