Overview
Dog breeds are highly searched topics on the internet, and Pethotel.io wanted to provide answers with valuable information about each breed. We needed to create content for over 200 dog breeds in multiple languages to cover as much of them as possible.
Objectives
- Create good content for over 200 dog breeds without manual effort.
- Support multiple languages to attract more traffic.
- Improve SEO by providing valuable information to people that search for it.
- Keep content updated regularly to maintain relevance.
- Semantic front-end and rich snippets for attractive search results.
Background
Initially, our breed pages were lacking in-depth content. We wanted to offer more valuable information to our users and improve our search engine rankings. Manually writing content for each breed in multiple languages was out of the question, so we needed an automated solution.
We also wanted the content to feel natural and engaging, not like it was churned out by a machine. Finding the right balance between automation and quality was key.
Implementation
Approach
We built a background job called BreedContentUpdaterJob
that runs periodically to update breed content. This job uses the BreedContentUpdaterService
to process each breed in different languages.
Here's how the job is scheduled:
breed_content_updater_job:
cron: "0 * * * *"
class: "BreedContentUpdaterJob"
queue: default
The service works by iterating over all breeds with their respective languages and updating their content by calling the ThinkingService
, which interacts with a language model to generate or improve content sections like behavior, origin, temperament, and more.
Handling Multiple Languages
Handling multiple languages did not significantly increase the workload on me so I decided to do it. I looped the same content creation task through all our application languages. This way the task dynamically requested large language model to output it's content in the target language dynamically.
Writing Style and Content Quality
Goal was for the content to be natural and actually informative human-like content, avoiding generic or overly formal language. The prompts used in the BreedContentUpdaterService
are designed to encourage the language model to produce engaging and valuable content. For example:
writing_style = "Provide improved, well-written content. Make sure that the content stays natural and human-like. Keep it highly valuable to the reader with just a touch of creativity but not too much."
Updating Specific Content Sections
The service updates several sections for each breed, such as behavior, origin, temperament, history, suitable home, common health issues, and more. It handles each section separately, so if one section fails to update due to an error, the others aren't affected.
And here's how the service updates a content section:
# === BEHAVIOR === #
begin
system_model = "gpt-4o"
system_prompt = "#{identity}, #{language}, #{good_content}, #{writing_style} IMPROVE THE FOLLOWING BEHAVIOR SUMMARY FOR #{breed_name}: #{behavior}"
user_prompt = "IMPROVE THE BEHAVIOR SUMMARY FOR #{breed_name}. #{only_return_summary}"
assistant_prompt = "Here is the improved behavior summary:"
received_summary = thinking_service.answer(
system_model,
system_prompt,
user_prompt,
assistant_prompt,
max_tokens,
temperature,
presence_penalty,
frequency_penalty,
prompt_request
)
received_summary_text = received_summary[:text]
breed.update(behavior: received_summary_text)
puts "Behavior updated!"
rescue StandardError => e
puts "Error updating behavior: #{e}"
end
Challenges Faced
I made way too many tables
I should obviously have managed all in single table. Instead of having separate tables for each language I realized afterwards that should have managed all breeds in a single table with a language column to it. This was such a
Making human-like content required putting time into my prompts
One of the main challenges was ensuring the generated content was high-quality and didn't feel like it was written by a machine. We had to fine-tune the prompts and parameters to get the desired output.
Managing API Costs was a concern but turned out surprisingly cheap
Interacting with language models via APIs can be costly, especially when processing a large amount of content. This was made using gpt-4-turbo which costed around 80€ for all the content generated. However now there is better models available that would be cheaper and more efficient.
Heavy prompting added room for poor instructions
Writing prompts that guide the language model to produce specific content sections had many text instructions that could be prone to be made wrong. This was a challenge to get right and required multiple iterations to improve the quality of the generated content.
Results
I wrote over half a million words of content across more than 200 dog breeds in multiple languages. Our SEO metrics soared, with increased organic traffic, new and much better search rankings for breed-related keywords.
- Generated over 500,000 words of content for 200+ dog breeds
- Climbed to top 5 search results with many new keywords
- Google respected the content and started to show rich snippets which increased the click-through rate
Goal was also to increase reach and bring awareness to the site. However our ability to meet this goal is very hard to measure. We have seen an increase in traffic and the time spent on the site has increased. This is a good indicator that the content is engaging and valuable to the users but we don't have any evidence on these users turning to customers.
Lessons Learned
Automation Saves Time but Requires Thought to Implement
Automating content creation saved us a huge amount of time, but it also required careful planning and fine-tuning to get the desired results.
In this experiment it still would have saved many thousands of hours of manual work even if the quality of the content was not perfect. The quality of the content was however relatively good and the system was able to generate content that was very close to what a I would write myself.
Traffic gain is consistent but takes some time to build up depending on domain authority
It takes time for search engines to index and rank new content. We saw a gradual increase in organic traffic over several months as our content gained visibility in search results. So it's not a quick fix but a long term bet.
Content needs to be valuable and regularly updated
The system can be improved over time by updating prompts, adjusting parameters, and incorporating user feedback to enhance content quality. System is also capable of updating content consistently over time if needed.
Automating the content creation process allowed us to create content that would have been impossible to produce manually. The system generates high-quality, engaging content for over 200 dog breeds in multiple languages, enhancing our SEO performance and user engagement.