Benefits and watch outs for using synthetic data in market research

Emily Hallwig

Market research is constantly evolving, offering new ways of collecting and understanding data as we progress with technological advancements. 

One exciting recent development is that the industry is beginning to mine data in a completely different way — using existing data to produce synthetic information about respondents and ideas. 

As with many new forms of technology in business, this new way of utilizing data presents a lot of questions. How does this new form of data affect our existing insights? What are the benefits of exploring synthetic data? What challenges might arise? Can this type of data be trusted? 

That’s what I’ll dig into here. Read on for a breakdown of some of the benefits and things to watch out for when using synthetic data for your research. 

Benefits of synthetic data

1. Faster, more flexible data collection

One of the most obvious benefits of synthetic data is the ability to build datasets faster. Using synthetic data creates a more flexible data collection process that increases scalability without increasing cost or time needed to collect data. 

Synthetic data uses prediction modeling, which is based off of previously collected data, to anticipate what consumer responses could be. This allows for market research companies to expand on their datasets without worrying about growing privacy or data access concerns, while having the flexibility to simulate a variety of market scenarios.

"The problem with survey data is that it’s small data. It’s not big data. But the good thing about it, it’s incredibly rich and incredibly insightful…I think what the really exciting opportunity is is to take synthetic approaches to the richness of the data to expand it so you can have rich big data rather than just rich small data."

- Steve Phillips, founder and CEO, Zappi

2. Get smarter over time

Because AI is constantly learning, market research companies working with synthetic data can teach and train their models to produce data the way that best suits their needs. 

Using prompts and robust real data, researchers can prime and optimize their AI tools to create smart synthetic data that learns over time. Whether the goal is to create synthetic respondents, synthetic predictions to answers or new or optimized ideas, model training can improve how close these synthetic datasets are to real ones.

3. Enhance and enrich datasets

Another benefit of synthetic data is the ability to fill in gaps in existing datasets with more comprehensive or nuanced information. 

"We as researchers are in an incredibly brilliant position because we can be the data asset that is being used by AI to create new ideas and products. But the critical piece is that we are the ones who manage and worry about that data."

- Steve Phillips, founder and CEO, Zappi

Being able to create deeper data without having to run additional studies allows market researchers to widen the array of information that’s available with less cost and more efficiency. 

With that said, it’s vital to note that synthetic data is not a replacement for real consumer data. The importance of continuing to collect real consumer data is critical, especially when we consider that the output of good synthetic data requires the input of rich, real data.

4. Reduce privacy and data access concerns

Finally, as people grow more concerned about privacy and data breaches, handling data responsibly is crucial. Synthetic data offers a smart solution by using algorithms instead of personal information. In this way, we can conduct research without compromising individual privacy.

Watch outs

1. Bad data in = bad data out

Though synthetic data has many benefits, there are some challenges to consider as well. 

The ability to generate synthetic responses and synthetic individuals is probably one of the biggest evolutions of technology in the data space. But, the data provided is what ultimately determines what comes out. 

For instance, if the initial data is bad (inaccurate, false or otherwise wrong) and used to train the AI that provides synthetic data, the resulting data would be just as inaccurate. This emphasizes the importance of using the highest quality data to lay the foundation for the synthetic data you create.

"Data is the underpinning of everything we do, and it’s that old adage of garbage in, garbage out…Frankly, we can start using AI on top of bad data but we’ll get bad outcomes. So you have to get the data right." 

- Steve Phillips, founder and CEO, Zappi

2. Using old data

Similarly, providing AI with old data (such as information that is months or years old) could result in synthetic data that is no longer relevant or useful in the present day. 

As the market constantly moves and shifts, so are the perspectives and values of customers. Using old data to make assumptions or lead marketing efforts has the potential to miss the mark entirely. This again hammers in the importance of continuously collecting current, representative consumer data and applying synthetic approaches to enhance that data, not replace it.

3. Skewed data

Along the same lines, inputting data that’s skewed or doesn’t align with norms will result in data that is just as, if not more, skewed. Skewed data can be anything that results from using leading questions to response bias to geographical bias and more. 

The more outliers an AI is fed, the less consistent the data will be. The moral here is again to only use quality data to cultivate synthetic results, and to maintain the practice of collecting and inputting new, real data.

4. Biased or deceptive results

Because AI is capable of being trained, the algorithms are also able to be inadvertently swayed or biased one way or another. 

This can lead to preferences in the synthetic data about certain demographics or thinking patterns, rather than creating balanced, unbiased data — another important watch out to stay on top of. 

Conclusion

Synthetic data has immense potential for researchers who understand the challenges and necessary boundaries of its applications. Deepened data, scaled respondent feedback, reduced costs, smarter models and data assets are just some of the core benefits to embracing this new way of gathering insights. 

Working with synthetic data can mean more innovation and less time spent on things that can easily be automated. But keep in mind: Using synthetic data appropriately means using it to support and guide research while holding fast to the ethics that make up true market research — which includes authentic human insights and careful research practices. 

"You have to think of this as a seminal moment to rethink the world that you’re in and the way you approach things. Otherwise you will get rapidly, and very rapidly nowadays, left behind."

- Steve Phillips, founder and CEO, Zappi

How researchers can use AI to do a reverse take over of marketing

For more content like this, check out our webinar with Steve Phillips, CEO and Founder of Zappi, who demonstrates how generative AI combined with a high-impact data system will shift the balance of power between insights and marketing.

Talk to us