Want to build a feed generator, real-time analytics dashboard, or bot that responds to Bluesky events? You need the Firehose - Bluesky's real-time event stream. This guide covers everything from what the Firehose is to building applications that consume it.
What is the Bluesky Firehose?
The Firehose is a live stream of every public event that happens on the Bluesky network. Every post, like, repost, follow, block, and profile update flows through this stream in real-time.
Think of it as a river of data constantly flowing from all Bluesky users. By tapping into this stream, you can:
- Build custom feeds - Filter and curate posts for specific audiences
- Create analytics - Track trends, popular topics, and network statistics
- Power bots - Respond to mentions, keywords, or specific actions
- Send notifications - Alert users when something relevant happens
- Archive content - Store posts for research or backup
Two Ways to Access: Firehose vs Jetstream
Bluesky offers two ways to consume the real-time stream:
1. Raw Firehose (com.atproto.sync.subscribeRepos)
The original, full-fidelity stream:
- Format: CBOR (Concise Binary Object Representation)
- Content: Complete repository sync events with Merkle tree proofs
- Endpoint:
wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos - Use case: When you need cryptographic verification or full sync capability
2. Jetstream (Recommended for Most Apps)
A developer-friendly alternative:
- Format: JSON (easy to parse in any language)
- Content: Record-level events with filtering options
- Endpoint:
wss://jetstream2.us-east.bsky.network/subscribe - Use case: Most applications - bots, feeds, analytics
Recommendation: Start with Jetstream. It's simpler, uses less bandwidth, and provides the same data for most use cases.
Understanding Event Types
Events in the stream correspond to ATProtocol collections. Each collection represents a type of content:
| Collection | Description |
|---|---|
app.bsky.feed.post |
Posts (including replies and quotes) |
app.bsky.feed.like |
Likes on posts |
app.bsky.feed.repost |
Reposts (shares) |
app.bsky.graph.follow |
Follow relationships |
app.bsky.graph.block |
Block relationships |
app.bsky.graph.list |
Lists (curation and moderation) |
app.bsky.actor.profile |
Profile updates |
Connecting to Jetstream
Here's how to connect to Jetstream and start receiving events:
Basic Connection (JavaScript/Node.js)
const WebSocket = require('ws');
const JETSTREAM_URL = 'wss://jetstream2.us-east.bsky.network/subscribe';
// Connect to all events
const ws = new WebSocket(JETSTREAM_URL);
ws.on('open', () => {
console.log('Connected to Jetstream');
});
ws.on('message', (data) => {
const event = JSON.parse(data.toString());
console.log('Received event:', event);
});
ws.on('close', () => {
console.log('Disconnected from Jetstream');
// Implement reconnection logic
});
ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
Filtering by Collection
Reduce bandwidth by requesting only the collections you need:
// Only receive posts
const url = 'wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post';
// Multiple collections
const url = 'wss://jetstream2.us-east.bsky.network/subscribe?' +
'wantedCollections=app.bsky.feed.post&' +
'wantedCollections=app.bsky.feed.like';
const ws = new WebSocket(url);
Event Structure
Each Jetstream event looks like this:
{
"did": "did:plc:xyz...", // User who created the event
"time_us": 1702829400000000, // Microsecond timestamp
"kind": "commit", // Event kind
"commit": {
"rev": "3abc...", // Repository revision
"operation": "create", // create, update, or delete
"collection": "app.bsky.feed.post",
"rkey": "3abc...", // Record key
"record": { // The actual content
"$type": "app.bsky.feed.post",
"text": "Hello, Bluesky!",
"createdAt": "2025-12-17T12:00:00Z"
}
}
}
Building a Keyword Monitor
Here's a complete example that monitors posts for specific keywords:
const WebSocket = require('ws');
class KeywordMonitor {
constructor(keywords) {
this.keywords = keywords.map(k => k.toLowerCase());
this.ws = null;
}
connect() {
const url = 'wss://jetstream2.us-east.bsky.network/subscribe?' +
'wantedCollections=app.bsky.feed.post';
this.ws = new WebSocket(url);
this.ws.on('open', () => {
console.log('Monitoring for keywords:', this.keywords);
});
this.ws.on('message', (data) => {
this.handleMessage(JSON.parse(data.toString()));
});
this.ws.on('close', () => {
console.log('Connection closed, reconnecting in 5s...');
setTimeout(() => this.connect(), 5000);
});
this.ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
}
handleMessage(event) {
if (event.kind !== 'commit') return;
if (event.commit.operation !== 'create') return;
const record = event.commit.record;
if (!record || !record.text) return;
const text = record.text.toLowerCase();
for (const keyword of this.keywords) {
if (text.includes(keyword)) {
this.onMatch(event, keyword);
break;
}
}
}
onMatch(event, keyword) {
console.log(`\n--- Match found for "${keyword}" ---`);
console.log(`User: ${event.did}`);
console.log(`Text: ${event.commit.record.text}`);
console.log(`Time: ${new Date(event.time_us / 1000).toISOString()}`);
// Build the post URL
const postUri = `at://${event.did}/app.bsky.feed.post/${event.commit.rkey}`;
console.log(`URI: ${postUri}`);
}
}
// Usage
const monitor = new KeywordMonitor(['bluesky', 'atprotocol', 'skyscraper']);
monitor.connect();
Building a Feed Generator
Feed generators use the Firehose to collect posts and serve custom feeds. Here's the architecture:
// 1. Consume the firehose
const collectPosts = (event) => {
if (event.commit.collection !== 'app.bsky.feed.post') return;
const post = {
uri: `at://${event.did}/app.bsky.feed.post/${event.commit.rkey}`,
cid: event.commit.cid,
author: event.did,
text: event.commit.record.text,
createdAt: event.commit.record.createdAt,
indexedAt: new Date().toISOString()
};
// 2. Apply your feed logic
if (matchesFeedCriteria(post)) {
saveToDatabase(post);
}
};
// 3. Serve the feed via API
app.get('/xrpc/app.bsky.feed.getFeedSkeleton', (req, res) => {
const { feed, cursor, limit } = req.query;
const posts = getPostsFromDatabase(feed, cursor, limit);
res.json({
cursor: posts.length ? posts[posts.length - 1].indexedAt : undefined,
feed: posts.map(p => ({ post: p.uri }))
});
});
Using the Raw Firehose
If you need the raw Firehose for verification or sync capabilities:
const WebSocket = require('ws');
const cbor = require('cbor');
const { CarReader } = require('@ipld/car');
const FIREHOSE_URL = 'wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos';
const ws = new WebSocket(FIREHOSE_URL);
ws.on('message', async (data) => {
// Messages are CBOR-encoded
const decoded = cbor.decode(data);
// The message contains a header and body
const header = decoded[0];
const body = decoded[1];
if (header.op === 1 && header.t === '#commit') {
// body.blocks contains a CAR file with the records
const car = await CarReader.fromBytes(body.blocks);
for (const op of body.ops) {
console.log('Operation:', op.action, op.path);
if (op.cid) {
const block = await car.get(op.cid);
const record = cbor.decode(block.bytes);
console.log('Record:', record);
}
}
}
});
Cursor-Based Recovery
Both Firehose and Jetstream support cursors for recovering missed events:
class ResilientConsumer {
constructor() {
this.lastCursor = null;
}
connect() {
let url = 'wss://jetstream2.us-east.bsky.network/subscribe';
// Resume from last position if available
if (this.lastCursor) {
url += `?cursor=${this.lastCursor}`;
}
this.ws = new WebSocket(url);
this.ws.on('message', (data) => {
const event = JSON.parse(data.toString());
// Save cursor for recovery
this.lastCursor = event.time_us;
this.processEvent(event);
});
this.ws.on('close', () => {
// Reconnect with cursor to resume where we left off
setTimeout(() => this.connect(), 5000);
});
}
processEvent(event) {
// Your processing logic
}
}
Jetstream Instances
Bluesky provides multiple Jetstream instances for redundancy:
wss://jetstream1.us-east.bsky.network/subscribewss://jetstream2.us-east.bsky.network/subscribewss://jetstream1.us-west.bsky.network/subscribewss://jetstream2.us-west.bsky.network/subscribe
Choose based on your geographic location, or implement failover between instances.
Best Practices
Performance
- Filter at the source - Use
wantedCollectionsto reduce bandwidth - Buffer events - Don't process synchronously; queue for async handling
- Batch database writes - Insert in batches rather than per-event
- Monitor memory - High-volume streams can consume significant RAM
Reliability
- Implement reconnection - Connections will drop; auto-reconnect is essential
- Persist cursors - Save cursor to disk/database for crash recovery
- Handle backpressure - If you can't keep up, events will be dropped
- Use multiple instances - Failover to another Jetstream if one is down
Rate Considerations
- Peak volume - Thousands of events per second during busy periods
- Bandwidth - Raw Firehose uses 4-8 GB/hour; Jetstream is more efficient
- Storage growth - Plan for significant data if archiving all events
Use Cases at Skyscraper
Here's how we use the Firehose for Skyscraper's features:
Trending Hashtags
We consume all posts, extract hashtags, and calculate trending scores based on volume and velocity.
Keyword Alerts
When users configure alerts, we filter the stream for matching keywords and send push notifications.
Analytics
Aggregate statistics about posting patterns, popular topics, and network growth.
Frequently Asked Questions
What is the Bluesky Firehose?
A real-time stream of all public events on Bluesky - posts, likes, follows, and more - delivered via WebSocket.
What is Jetstream?
Bluesky's developer-friendly API for consuming the Firehose. It provides JSON format and filtering capabilities.
Should I use Firehose or Jetstream?
Use Jetstream for most applications. Use the raw Firehose only if you need cryptographic verification or full repository sync.
How much data does the Firehose produce?
The raw Firehose produces 4-8 GB per hour. Jetstream with filtering uses significantly less bandwidth.