Here’s What Generative AI Does – and Doesn’t – Tell You

Lies of omission can be as impactful as an AI “hallucination.”

Photo by Pierre Bamin on Unsplash

Elton John is an "iconic" singer. Wilt Chamberlain was a "dominant" basketball player. Eve 6 is a "catchy" and "energetic" alternative rock band.

These aren't my opinions – they're descriptors pulled from ChatGPT responses. I analyzed thousands of responses from this generative AI platform to better understand trends in how ChatGPT references people, organizations, and events. The platform tends to cluster around common ways to describe topics like the descriptors above, which raises questions about the provenance of how well they can or should apply. Still, I grew more concerned about how outlier responses can mislead users.

Outlier responses are a big deal. A minority of users can get vastly different takeaways from a given topic and miss important information and events about individuals or organizations they’re trying to learn about. So in this article, I'll be sharing some of the outliers I found and my thoughts on what this means for both users of AI tools and the organizations affected by them.


Generative AI platforms like ChatGPT, trained on large data sets, can provide impressively coherent and relevant results that feel uniquely written for users. A natural consequence of this training is recurring patterns in replies. If a particular descriptor, event, phrase, or other association appears more commonly in a model's training data, the more likely it will appear in AI-generated responses – and vice versa.

Many people have written about subpar accuracy in AI responses, but consistency is just as important. In this case, commonly used answers are a good thing.

If you ask ChatGPT about this, it is likely to say, "Users should be cautious and critically evaluate AI-generated information, considering the possibility that it may not capture the full range of descriptors or perspectives." But it's much easier to evaluate information that is present in responses as opposed to what is left out of responses entirely.

To evaluate the impact of this phenomenon, I tested a selection of broad queries about several well-known people and organizations. These queries were structured to give ChatGPT as much freedom as possible in its responses:

  • For individuals: What can you tell me about XYZ?

  • For organizations: Tell me about the history of XYZ.

I decided to tailor the question differently for organizations as, for some reason, using the first phrase for these groups resulted in strategic analyses instead of a factual/event-oriented readout. Here's what I found.


Not all personal connections are worth naming (according to ChatGPT).

Two of the first politicians I tested were Joe Biden and Donald Trump. Aside from the obvious comparisons, they each had prominent careers before they took office, and are associated with significant events in ChatGPT's training data (albeit cut off in Biden's case in September 2021, which is when ChatGPT's data ends). When comparing ChatGPT's responses for these two men, I immediately saw meaningful differences in how it handled each topic – both in the substance of its responses and what was left out.

ChatGPT mentioned Joe Biden's family frequently in its responses about him. His children -- Beau, Hunter, Ashley, and the late Naomi -- were referenced by name in 74% of replies using GPT-4. On the other hand, Donald Trump's children are rarely mentioned by name – appearing in just 3% of responses. Jill Biden and his first wife, Neilia, were referenced in 61% of responses, compared to just 4% of replies combined for Ivana Trump, Marla Maples, and Melania Trump.

Donald Trump's family was featured frequently in news coverage during his administration, at least equal to the amount written about Biden's family. I don't know why ChatGPT mentions one more often than the other. But interestingly, this is a phenomenon unique to GPT-4. When testing in GPT-3.5, response rates for Biden's children and wives drop to 2% and 1%, respectively. It is an excellent example of how updated or upgraded AI models prioritize information differently in responses and the lack of transparency involved.

A chart which visually shows the aforementioned mention trends for Joe Biden. Barack Obama, Beau Biden, and Donald Trump are mentioned most frequently by ChatGPT, followed by the rest of his family and Kamala Harris.

As with Biden's family, ChatGPT referenced Vice President Kamala Harris in 48% of broad responses about Joe Biden. Meanwhile, there were zero mentions of Mike Pence in these general answers about Donald Trump. Specific follow-up queries would have mentioned him (e.g., "Who was Donald Trump's vice president"). Still, ChatGPT's model does not prioritize Pence in what users might want to know about Trump in general.

A chart which visually shows the aforementioned mention trends for Donald Trump. Joe Biden, Hillary Clinton, and Barack Obama are mentioned most frequently by ChatGPT, with the rest of his family referenced very rarely in results.

So depending on the subject, similarly close personal associations can be rendered invisible in ChatGPT responses. And it can get serious when we're talking about significant, negative information about people and organizations that matter to users.

What slips through the cracks can influence user opinions.

The January 6th insurrection remains one of the most widely remembered moments from Donald Trump's presidency. And indeed, ChatGPT responses to broad queries about Trump typically reference this event – either as a standalone fact or through a description of his second impeachment.

But even an event as prominent as this was only present in 95% of the responses I reviewed. That might sound like a lot – and it is – but so is 5% when talking about a platform at the scale of ChatGPT. Here's an example passage of his presidency from the 5% minority:

"Trump's presidency was characterized by its confrontational style, policy changes through executive order, and significant turnover of staff and cabinet members. His administration implemented a tax overhaul, attempted to repeal the Affordable Care Act, initiated a hardline immigration policy, and saw multiple investigations into Russian interference in the 2016 election. Trump was impeached twice by the House of Representatives, once in December 2019 and again in January 2021, but was acquitted both times by the Senate.

In November 2020, Trump ran for re-election against Democrat Joe Biden. The election took place amidst the ongoing COVID-19 pandemic, which significantly impacted the United States. Biden won the election, but Trump refused to concede, alleging without substantial evidence that the election was fraudulent."

And contrast with this response passage that does reference the event (emphasis mine):

"His presidency was marked by significant controversy and political division. His policies included tax cuts, deregulation, a hard-line immigration policy, and attempts to repeal the Affordable Care Act ("Obamacare"). He also appointed three justices to the Supreme Court. Trump was impeached twice by the House of Representatives - first in December 2019, on charges of abuse of power and obstruction of Congress related to the Ukraine scandal, and then again in January 2021, charged with incitement of insurrection following the storming of the U.S. Capitol by his supporters. He was acquitted both times by the Senate.

In November 2020, he ran for reelection but was defeated by Democratic candidate Joe Biden. Following the election, Trump and some of his supporters claimed without evidence that there had been widespread voter fraud, leading to numerous lawsuits and the aforementioned attack on the U.S. Capitol."

Note the differences in tone and comprehensiveness — in this case, I'd argue that the added context of the latter passage is essential for understanding the topic.

I also noted similar issues with some of the organizations I looked at. Ask a random person on the street to name the most important event they can think of relating to ExxonMobil. I'd bet nearly all would respond with the Exxon Valdez oil spill from 1989. It was an infamous environmental disaster, well-documented, and well-represented in national conversation. Yet when asked about the history of ExxonMobil, just 46% of GPT-4 responses mentioned the Exxon Valdez disaster, and that drops to 13% for GPT-3.5.

Consider two passages covering the history of ExxonMobil from the mid-20th century to today. Here's the first, one that does not reference the disaster (Editor's note: "both companies" refers to Exxon and Mobil, pre-merger):

"Post-War Development: Both companies expanded their operations worldwide in the decades after World War II. They developed and marketed new products, including various types of fuel, lubricants, and petrochemicals. They also invested heavily in exploration and production activities around the world.

Merger into ExxonMobil: The culmination of these parallel histories occurred in 1999, when Exxon Corporation and Mobil Oil Corporation merged to form Exxon Mobil Corporation, commonly known as ExxonMobil. The merger, valued at $73.7 billion, was one of the largest in history at the time. The merged entity inherited the histories and cultures of both Exxon and Mobil, as well as their strengths in all aspects of the oil and gas industry."

What could you compare this passage to? There is no direct comparison – a response either mentions the event or doesn't. There's a big hole between those two paragraphs, but how would you know without context? Here's the second example passage from the minority (emphasis mine):

"Mid 20th century: The evolution of Standard Oil descendants: Two of these offshoots would eventually become Exxon and Mobil: Standard Oil of New Jersey (which became Exxon) and Standard Oil of New York (which became Mobil). Both companies underwent several name changes throughout the mid-20th century. Standard Oil of New Jersey changed its name to Esso in 1931, and later became Exxon in 1972. Meanwhile, Standard Oil of New York became Socony-Vacuum Oil Company in the 1930s, then Socony Mobil Oil Company in the 1950s, and finally just Mobil in the 1960s.

Late 20th century: The Exxon Valdez oil spill: One of the darkest moments in Exxon's history was the Exxon Valdez oil spill in 1989. The Exxon Valdez, an oil tanker owned by the Exxon Shipping Company, spilled hundreds of thousands of barrels of crude oil in Prince William Sound, Alaska. The incident resulted in significant environmental damage and remains one of the largest oil spills in U.S. history.

1999: The ExxonMobil merger: Exxon and Mobil merged in 1999 to form ExxonMobil. The merger was valued at approximately $75 billion and created the largest company in the world at the time. The consolidation was a response to low oil prices and was seen as a move to improve efficiencies and boost profitability. The merger was also controversial and faced a great deal of scrutiny due to concerns about reducing competition in the oil industry.

21st Century: Recent developments: In the 21st century, ExxonMobil has faced a number of challenges and controversies, including those related to climate change. The company has been criticized for its environmental record and its role in climate change denial. In the 2010s, it emerged that Exxon had known about the dangers of climate change since the 1970s but had publicly denied these risks."

Quite a difference in the tone, depth, and number of potential takeaways for a reader! Why is this happening? Generative AI platforms ingest a corporation's content marketing and advertising as well as independent news coverage, and one of these groups is significantly more likely to cover adverse events and updates than the other. ExxonMobil is no exception – when plugging my questions into other tools like Bing's AI assistant, which disclose sources, Bing's answers frequently listed ExxonMobil's websites.

When people use ChatGPT, I doubt they mean to get an organization's cherry-picked POV, but that's what they may be getting on many topics without knowing it. And those selections can have downstream impacts on buyer/consumer journeys.

Frequent associations can narrow the focus of user inquiries.

I was particularly interested in how ChatGPT would highlight the works of musical artists and how frequently certain songs would appear in responses.

Eve 6 is an alternative rock band (not to be confused with Third Eye Blind) best known for their hit songs "Inside Out" and "Here's to the Night" (especially the latter for anyone whose prom was in the '00s). As you'd expect, those songs were referenced in every single response from ChatGPT when asked about the band. Two other songs – "Promise" (86%) and "Leech" (77%) – also featured frequently in responses. It makes sense – each ranked highly in contemporary music charts and appeared in album reviews. But response rates quickly dropped off from there:

  • "Victoria": 29%

  • "On the Roof Again": 20%

  • "Think Twice": 13%

  • "Tongue Tied": 13%

  • "Open Road Song": 12%

Another artist I tried, Elton John, has a similar trend despite a much more extensive discography (sorry Eve 6). On the one hand, it makes sense because there's only so much room to respond in a single chat response. On the other hand, it limits what can appear in answers about prolific individuals and groups.

For John, "Your Song" (95%) and "Rocket Man" (93%) were referenced most frequently in responses, followed by "Candle in the Wind" (87%) and "Tiny Dancer" (58%). Other song response rates include:

  • "Bennie and the Jets": 48%

  • "Goodbye Yellow Brick Road": 47%

  • "Don’t Let the Sun Go Down on Me": 45%

  • "Can You Feel the Love Tonight": 25%

  • "Don’t Go Breaking My Heart": 19%

Missing from this list are #1-ranking singles "Crocodile Rock" (15%) and "Philadelphia Freedom" (6%), as well as "Lucy in the Sky with Diamonds," "Island Girl," and "Cold Heart," which were never mentioned at all in my sample.

Users will not just employ generative AI to learn more about artists, actors, and other performers; AI-generated responses will inform what that person clicks, listens to, or watches next. When filtered through a generative AI, I can imagine a world where users are led to a relatively small number of results to follow up on – concentrating discovery around a handful of popular assets through streaming platforms and other aggregators.

Now this is relatively low stakes – and I'm sure the artists mentioned here would be happy for users to jump in anywhere in their respective discographies. But imagine the implications for other industries:

  • CPG: A customer is fed up with their usual brand of coffee and asks ChatGPT for a substitute, but they only get a list featuring long-established brands (and not your company's newly-released brand, designed just for their taste profile).

  • B2B: A buyer is preparing to distribute an RFP in a sector they are less familiar with and queries ChatGPT for a list of businesses to send it out to. They get an outlier response that excludes your organization, even though it's the market leader.

So, generative AI has the potential to alter buyer journeys in a meaningful way. What now?

For users, I think the takeaways are two-fold:

  1. Regenerate responses often: If you're using ChatGPT or another tool, regenerating the response can give you additional information that could better answer your questions. The context can help evaluate the original response -- even if it's an outlier.

  2. Turn to (human) experts: Generative AI can be a handy tool, but it is not as reliable as going to human sources. "Trust but verify" doesn't work well for omissions and only puts you behind the curve in acclimating to a subject. Not to mention that they'll provide a broader lens than what generative AI can offer by default.

For organizations it’s time to get savvy about response trends:

  1. Learn what generative AI tools are saying about your company and your brands: If there's one thing I can say from this exercise, you can't get a sense of what AI platforms are saying on a subject from just a handful of queries. You'll want to know if negative information is surfacing for your customers immediately.

  2. Use AI platforms as intelligence for what users may be learning about competitors: As they enter the mainstream, they will occupy a critical role in the buyer journey, and knowing trends in what buyers are finding out can help you reposition your offerings. In-depth research snapshots that analyze AI platforms like ChatGPT can help you get a handle on trends in this space.

Omissions are challenging, but a research partner can help you get a handle on how it may be affecting your organization. Need help figuring out what to do next? Get in touch.


All About Eve… 6.

If you've made it this far, here's some bonus analysis – again on Eve 6. Anyone who was 1) alive in the 90s, 2) an alt-rock fan, and 3) a sci-fi enthusiast knows that the name of the band is from an X-Files episode, featuring a character named “Eve 6.”

ChatGPT's latest version, GPT-4, responded correctly in every instance in my sample, but GPT-3.5 is… less than confident. In fact, in most responses describing the name's origin, ChatGPT got it wrong. Yes, it's an excellent example of how AI models are getting more sophisticated over time, but I couldn't help but share a list of the top origins of where "Eve 6" came from – according to GPT-3.5 – for your amusement:

  • Starship Troopers: 14 responses

  • All About Eve: 5

  • X-Files, TV Series: 4 (correct!)

  • X-Files, Movie: 3

  • Star Trek VI: The Undiscovered Country: 2

  • Videodrome: 2

  • Star Trek II: The Wrath of Khan: 1

  • The Bible: 1 (technically correct?)

  • “the name of one of the band member's previous classmates”: 1 (maybe??)

A chart which visually summarizes the aforementioned stats on ChatGPT's accuracy with regards to the origin of the Eve 6 band name.

Disclosure: This blog post was written with generative AI assistance and a human firmly in the driver's seat.

Previous
Previous

Data Point: 98% of videos on TikTok come from the most active 25% of users.