intermediate
Auto-Tagger for Blog Posts
Build a smart tagging system that suggests tags for new posts based on similar published content.
Auto-Tagger for Blog Posts
Build a tagging system that learns from your existing content and suggests relevant tags for new posts. Using kNN, we can find similar published posts and recommend their tags.
What You’ll Learn
- Using kNN for multi-label suggestions
- Leveraging similarity scores for confidence
- Building a practical content management tool
Why kNN for Tagging?
Unlike Bayes (which picks one category), kNN returns similar items with similarity scores. This is perfect for tagging because:
- Posts can have multiple tags
- We can see why a tag was suggested (similar posts)
- New tags can be added without retraining
Project Setup
mkdir blog_tagger && cd blog_tagger
# Gemfile
source 'https://rubygems.org'
gem 'classifier'
The Auto-Tagger
Create auto_tagger.rb:
require 'classifier'
require 'json'
class AutoTagger
def initialize
@knn = Classifier::KNN.new(k: 5, weighted: true)
@post_tags = {} # Maps post content to its tags
end
# Add a published post with its tags
def add_post(content, tags)
tags = Array(tags)
@post_tags[content] = tags
# Add to kNN with each tag as a category
tags.each do |tag|
@knn.add(tag => content)
end
end
# Suggest tags for new content
def suggest_tags(content, max_tags: 5)
result = @knn.classify_with_neighbors(content)
return [] if result[:neighbors].empty?
# Tally tag votes weighted by similarity
tag_scores = Hash.new(0.0)
result[:neighbors].each do |neighbor|
similarity = neighbor[:similarity]
post_content = neighbor[:item]
# Get all tags from this similar post
@post_tags[post_content]&.each do |tag|
tag_scores[tag] += similarity
end
end
# Return top tags with confidence scores
tag_scores
.sort_by { |_, score| -score }
.first(max_tags)
.map { |tag, score| { tag: tag, confidence: normalize_score(score) } }
end
# Get explanation for why tags were suggested
def explain_suggestions(content, max_tags: 3)
result = @knn.classify_with_neighbors(content)
return {} if result[:neighbors].empty?
explanations = {}
suggest_tags(content, max_tags: max_tags).each do |suggestion|
tag = suggestion[:tag]
# Find which neighbors contributed to this tag
contributors = result[:neighbors].select do |n|
@post_tags[n[:item]]&.include?(tag)
end
explanations[tag] = {
confidence: suggestion[:confidence],
similar_posts: contributors.map do |c|
{
excerpt: c[:item][0..80] + "...",
similarity: (c[:similarity] * 100).round(1)
}
end
}
end
explanations
end
def save(path)
@knn.storage = Classifier::Storage::File.new(path: path)
@knn.save
File.write("#{path}.tags", @post_tags.to_json)
end
def self.load(path)
tagger = new
storage = Classifier::Storage::File.new(path: path)
tagger.instance_variable_set(:@knn, Classifier::KNN.load(storage: storage))
tagger.instance_variable_set(:@post_tags, JSON.parse(File.read("#{path}.tags")))
tagger
end
private
def normalize_score(score)
# Convert to 0-100 percentage (capped)
[(score * 50).round(1), 100.0].min
end
end
Training with Existing Posts
Create train.rb:
require_relative 'auto_tagger'
tagger = AutoTagger.new
# Sample blog posts (in real usage, load from your CMS/database)
posts = [
{
content: "Getting started with Ruby on Rails. This tutorial covers MVC architecture, routing, and building your first web application with Rails.",
tags: ["ruby", "rails", "tutorial", "web-development"]
},
{
content: "Understanding React hooks. Learn useState, useEffect, and custom hooks to manage state in functional components.",
tags: ["javascript", "react", "frontend", "tutorial"]
},
{
content: "Deploying Rails apps to Heroku. Step-by-step guide for production deployment including database setup and environment variables.",
tags: ["ruby", "rails", "deployment", "heroku", "devops"]
},
{
content: "CSS Grid vs Flexbox. When to use each layout system and how to combine them for responsive designs.",
tags: ["css", "frontend", "web-development", "tutorial"]
},
{
content: "Building REST APIs with Ruby. Design principles, authentication, and best practices for API development.",
tags: ["ruby", "api", "backend", "tutorial"]
},
{
content: "Introduction to TypeScript. Static typing for JavaScript, interfaces, and migrating existing projects.",
tags: ["javascript", "typescript", "tutorial", "frontend"]
},
{
content: "Docker containers for Ruby development. Creating Dockerfiles, docker-compose, and development workflows.",
tags: ["ruby", "docker", "devops", "tutorial"]
},
{
content: "React component testing with Jest. Unit tests, snapshot testing, and mocking API calls.",
tags: ["javascript", "react", "testing", "frontend"]
},
{
content: "PostgreSQL performance tuning. Indexes, query optimization, and monitoring database performance.",
tags: ["database", "postgresql", "performance", "backend"]
},
{
content: "Building a Rails API with GraphQL. Schema design, queries, mutations, and subscriptions.",
tags: ["ruby", "rails", "graphql", "api", "backend"]
},
]
posts.each do |post|
tagger.add_post(post[:content], post[:tags])
end
tagger.save('tagger.json')
puts "Trained on #{posts.length} posts"
puts "Tags in corpus: #{posts.flat_map { |p| p[:tags] }.uniq.sort.join(', ')}"
Suggesting Tags
Create suggest.rb:
require_relative 'auto_tagger'
tagger = AutoTagger.load('tagger.json')
# New post that needs tags
new_post = <<~POST
Building a Vue.js application with Vuex state management.
This guide covers setting up Vue 3, creating components,
and managing global state with Vuex stores.
POST
puts "New post:"
puts new_post
puts
puts "Suggested tags:"
suggestions = tagger.suggest_tags(new_post)
suggestions.each do |s|
puts " #{s[:tag]} (#{s[:confidence]}% confidence)"
end
puts "\nWhy these tags?"
explanations = tagger.explain_suggestions(new_post)
explanations.each do |tag, data|
puts "\n#{tag} (#{data[:confidence]}%):"
data[:similar_posts].each do |post|
puts " - #{post[:similarity]}% similar: \"#{post[:excerpt]}\""
end
end
Run it:
ruby train.rb
ruby suggest.rb
Output:
Suggested tags:
tutorial (25.1% confidence)
ruby (21.5% confidence)
javascript (19.5% confidence)
react (19.5% confidence)
frontend (19.5% confidence)
Why these tags?
tutorial (25.1%):
- 29.3% similar: "Understanding React hooks. Learn useState, useEffect..."
- 14.8% similar: "Building REST APIs with Ruby. Design principles..."
- 6.2% similar: "Docker containers for Ruby development..."
ruby (21.5%):
- 22.1% similar: "Building a Rails API with GraphQL. Schema design..."
- 14.8% similar: "Building REST APIs with Ruby. Design principles..."
- 6.2% similar: "Docker containers for Ruby development..."
javascript (19.5%):
- 29.3% similar: "Understanding React hooks. Learn useState, useEffect..."
- 9.6% similar: "React component testing with Jest. Unit tests..."
Integration with a Blog
Here’s how to integrate with a simple blog system:
class BlogPost
attr_accessor :title, :content, :tags
def initialize(title:, content:, tags: [])
@title = title
@content = content
@tags = tags
end
def self.tagger
@tagger ||= AutoTagger.load('tagger.json')
end
def suggest_tags(max: 5)
self.class.tagger.suggest_tags("#{title} #{content}", max_tags: max)
end
def auto_tag!(confidence_threshold: 40)
suggestions = suggest_tags
@tags = suggestions
.select { |s| s[:confidence] >= confidence_threshold }
.map { |s| s[:tag] }
end
end
# Usage
post = BlogPost.new(
title: "Kubernetes Deployment Strategies",
content: "Learn blue-green and canary deployments for zero-downtime releases..."
)
puts "Suggested: #{post.suggest_tags.map { |s| s[:tag] }.join(', ')}"
post.auto_tag!(confidence_threshold: 30)
puts "Auto-tagged with: #{post.tags.join(', ')}"
Best Practices
- Retrain regularly: As you add new posts, periodically rebuild the tagger
- Use descriptive content: Include title + body + maybe excerpt for better matching
- Review suggestions: Auto-tagging works best as a suggestion, not replacement
- Set confidence thresholds: Only auto-apply high-confidence tags
Next Steps
- kNN Basics - Deep dive into k-Nearest Neighbors
- Persistence Guide - Production storage strategies