What is a Dataset? Do They Even Matter?
What is a dataset?
You might have been hearing a lot of talk about datasets recently. But what are they exactly?
Well, in short, datasets are the information the AI learns. Think about yourself. You went to elementary school, middle school, and high school, and with each grade you passed you learned more and more information.
Well, AI learns much the same way, but instead of taking 12 years to absorb all the information, the algorithm only takes a few hours or days to absorb it all. For a large language model (LLM) with billions of things to learn in the huge dataset (what computer scientists call “parameters”), it might take the AI some time to understand everything. When there’s less for the AI to learn, or when the dataset is smaller, the AI will learn it quicker.
And just like how what we as humans learn influences how we work and experience the world, so too do different datasets influence how an AI algorithm learns and operates.
Why are they important?
Datasets are important because it teaches the AI algorithm the skills it needs to do what it’s going to do.
For example, in an LLM, the data is trained on millions of words and phrases so the AI can understand how to form sentences.
An AI for creating images is trained on millions of images so the algorithm can understand you when you ask it to create an image of a dog in a tutu eating a pizza.
And an AI used for predicting the weather is trained on decades of weather patterns to best predict where a tropical storm might hit based on wind, water temperature, and a bunch of other factors.
The more information a dataset has on a specific subject, the better the AI will be able to predict or create a particular output.
So what makes Propel’s dataset different?
Propel uses its own proprietary dataset to help finetune the AI algorithm used to generate pitches and press releases.
Wait, what? Let’s break it down.
Basically, Propel trained its algorithm using its own data or in other words, information. The company is able to anonymize and utilize the pitches sent via the platform to understand what works and what doesn’t (in fact, this information is how we create our quarterly Media Barometer).
Because the information is Propel’s, and only Propel has it, it is considered “proprietary.” Taken from over 5 million pitches and press releases sent through the platform, we know what the optimal pitch length is, what the optimal subject line should be, and how a press release and pitch should be formatted.
The AI algorithm Propel built is then fine-tuned by this data. Fine tuning is exactly what it sounds like - tuning the AI to get what you want. Just like how a guitarist tunes their guitar to get the sound they want, so too does Propel fine tune its algorithm to get what we PR pros want!
Here’s how it works: Propel uses OpenAI’s LLM for its database of words and phrases. To make sure the AI doesn’t go haywire and write a 7,000 word pitch, Propel’s algorithm tells the LLM how to write: How long should the first paragraph be? How long should a press release be? Where should the quotes go? How should an email subject line be written to give it the greatest chance of being opened?
But what really sets Propel apart is the fact that it’s the only PR tool on the market with this data, and no one else can create pitch and press release drafts with the accuracy that Propel can. And because the dataset is constantly updating with each pitch sent through the platform, it always knows how to create the best results for your pitch and press release drafts.
You see? It’s all pretty simple.
Happy pitching.