Social Media Scraping Agents: X/Twitter & BlueSky Landscape for Marketing Engagement (2026)

Summary

The landscape for social media scraping and AI-powered engagement in 2026 spans five layers: (1) MCP-native data tools that give AI agents direct access to X/Twitter and BlueSky data (XActions with 140+ MCP tools, x-twitter-scraper at $0.00015/call, Xpoz), (2) Python scraping libraries that bypass official APIs via GraphQL/cookies (twscrape, Scweet v4, ElizaOS agent-twitter-client), (3) browser extensions for inline reply generation (Qura AI, TweetStorm, XReplyGPT), (4) commercial SaaS platforms that automate the full monitor-reply pipeline (ReplyGuy, TweetHunter, TrendRadar), and (5) workflow orchestration frameworks for custom pipelines (LangGraph social-media-agent, n8n with 515+ templates). BlueSky is dramatically more scraper-friendly than X due to the open AT Protocol, free Jetstream WebSocket firehose (~850 MB/day for all posts), and no authentication needed for public data reads. X's official API starts at $200/month for Basic read access, but pay-per-use launched February 2026 at ~$0.01/tweet. Platform risk is the dominant constraint: X explicitly bans automated keyword-based replies and will suspend accounts; BlueSky allows bots but mandates opt-in interaction only (users must tag the bot).

1. Technical Approaches to Scraping

1.1 X/Twitter Scraping Methods

Official API (Updated 2026-04-04)

X's API pricing underwent significant changes with a new pay-per-use model launching February 6, 2026:

Tier	Cost	Read Volume	Write Volume	Notes
Free	$0	Minimal	1,500 posts/mo	Posting only, minimal read access
Basic	$200/mo	10K tweets/mo	3,000 posts/mo	Raised from $100 in Oct 2024
Pro	$5,000/mo	1M tweets/mo	300K posts/mo	Full search, streaming
Enterprise	$42K-50K+/mo	Custom	Custom	Full firehose, analytics
Pay-per-use	~$0.01/tweet	2M cap	Variable	New Feb 2026, credit-based

(Source: X API Pricing 2026, Postproxy)

The free tier is intentionally posting-only — X wants developers to pay for read access. For monitoring/scraping use cases, the minimum entry is $200/month or the new pay-per-use model.

These tools reverse-engineer X's internal GraphQL endpoints used by the web client, authenticating via browser cookies rather than API keys:

Tool	Language	Auth Method	Account Pooling	Key Feature
twscrape	Python	Multi-account + IMAP email verification	Yes (SQLite)	Auto account switching on rate limit
Scweet v4	Python	Multi-account + proxy	Yes (SQLite)	DB-first provisioning, heartbeats, cooldowns
ElizaOS agent-twitter-client	TypeScript	Cookie-based (auth_token, ct0, twid)	No	Works without any API key
XActions	Node.js	Puppeteer browser automation	No	140+ MCP tools, cross-platform

twscrape (github.com/vladkens/twscrape): The most mature Python scraper. Supports async/await for parallel scraping, automatic account switching when one hits rate limits, login flow with email verification code reception via IMAP, and cookie persistence in SQLite. A single account handles hundreds to a few thousand tweets/day; multi-account pooling scales proportionally. (Source: twscrape GitHub)

Scweet v4 (github.com/Altimis/Scweet): Released February 2026. Moved to API-only core using X's GraphQL endpoints. Smart multi-account pooling with SQLite managing leases, heartbeats, daily counters, cooldowns, and automatic failover. Proxy support pairs each account with a different IP. (Source: Scweet 2026 Guide)

ElizaOS agent-twitter-client (github.com/elizaos/agent-twitter-client): Cookie-based Twitter client that avoids API costs entirely. Essential cookies: auth_token, ct0, twid. Known issue: "Session is invalid" errors after cookie expiry and suspicious login alerts from Twitter. There is also an MCP wrapper (github.com/ryanmac/agent-twitter-client-mcp) that exposes this as an MCP server for AI agents. (Source: ElizaOS GitHub)

Browser Automation

Tool	Browser Engine	Approach	Stealth
XActions	Puppeteer	Full browser automation, no API	Built-in
twitter-automation-ai	Selenium + undetected-chromedriver	Multi-account, keyword-based	selenium-stealth, random user-agents, proxy rotation
Playwright approaches	Playwright	Minimal scripts for posting/interacting	Limited
Browser Use	Any (LLM-controlled)	Natural language task -> browser actions	Variable

MCP-Native Tools (New Category, 2026)

A major 2026 trend: social media data exposed via Model Context Protocol, letting AI agents query social data through natural language:

XActions (github.com/nirholas/XActions): The most comprehensive open-source toolkit. 140+ MCP tools across scraping, posting, engagement, analytics, streaming. Supports X/Twitter, BlueSky, Mastodon, and Threads. Runs entirely locally — no data leaves the machine. Also includes CLI, Node.js library, browser extension, and 50+ browser scripts. MIT license. Added workflow engine with declarative JSON pipelines, real-time streaming via Socket.IO, sentiment analysis, and social graph mapping in v3.1.0. (Source: XActions GitHub)

x-twitter-scraper (github.com/Xquik-dev/x-twitter-scraper): 120 REST API endpoints, 2 MCP tools, 23 extraction types. Reads at $0.00015/call — 33x cheaper than official API. Works with 40+ AI agents including Claude Code, Cursor, Codex, Copilot. Credit-based pricing: 1 credit = $0.00015, read ops cost 1-3 credits. $20/month subscription available. (Source: x-twitter-scraper GitHub)

Xpoz: MCP-first platform enabling natural language queries through AI assistants like Claude and ChatGPT. (Source: Xpoz)

Apify MCP: Apify's Twitter scraper now has an MCP server endpoint, enabling AI agents to programmatically scrape tweets. $0.25 per 1,000 tweets. (Source: Apify Twitter MCP)

Commercial Scraping APIs

Provider	Pricing	Key Feature
Apify	$0.25-0.40/1K tweets	Pre-built actors, MCP server, no-code
TwitterAPI.io	$0.15/1K tweets	Pay-as-you-go, 1K+ req/sec
Bright Data	Usage-based	Proxy infrastructure, social media scrapers
ScrapeCreators	Usage-based	Real-time social media scraping APIs
EnsembleData	Usage-based	Multi-platform social media data APIs

1.2 BlueSky Scraping Methods

BlueSky's AT Protocol is fundamentally different from X — it's an open protocol designed for interoperability. This makes it the most scraper-friendly major social platform.

AT Protocol Public API (Free, No Auth for Reads)

BlueSky's API is fully open for public data reads with no authentication needed for profiles and posts. Search functionality (app.bsky.feed.searchPosts) now requires authentication but is still free to use.

Key endpoints:

app.bsky.feed.getAuthorFeed — get posts from a specific user
app.bsky.feed.searchPosts — keyword search (auth required)
app.bsky.actor.searchActors — find users
com.atproto.sync.subscribeRepos — full firehose subscription

Rate limits (generous):

Metric	Limit
Points/hour	5,000
Points/day	35,000
CREATE cost	3 points
UPDATE cost	2 points
DELETE cost	1 point
Max creates/hour	~1,666
Max creates/day	~11,666
API requests/5 min	3,000 (per IP)

(Source: BlueSky Rate Limits)

Jetstream (Real-Time WebSocket Firehose)

Jetstream is BlueSky's official simplified streaming solution — a WebSocket server that consumes the full AT Protocol firehose and redistributes it as simple JSON. This is the key differentiator for BlueSky monitoring.

How it works:

Full AT Proto Firehose (CBOR) → Jetstream Server → JSON WebSocket → Your Client

Connection: wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post

Official instances:

jetstream1.us-east.bsky.network
jetstream2.us-east.bsky.network
jetstream1.us-west.bsky.network
jetstream2.us-west.bsky.network

Filtering:

By collection NSID: filter to only posts, likes, follows, etc. (max 100 collections)
By repo DID: filter to specific users (max 10,000 DIDs)
Supports NSID prefixes like app.bsky.*

Bandwidth: ~850 MB/day for all posts on the network. Compressed messages are ~56% smaller than raw JSON.

Trade-off: No cryptographic signatures or Merkle tree nodes — data isn't self-authenticating (unlike the raw firehose).

Client libraries:

Official Go client included in the Jetstream repo
Python: Simple WebSocket connection with websockets library — just a few lines
TypeScript: Fully typed client available
Ruby: skyfall gem supports Jetstream since v0.5

(Source: Jetstream Blog Post, Jetstream GitHub, Jaz's Blog)

Feed Generators (Custom Algorithms)

BlueSky's feed generator framework lets you build custom algorithmic feeds that filter the firehose by any criteria — keywords, sentiment, engagement signals, user lists.

Architecture:

Your server subscribes to the firehose/Jetstream
Indexes posts matching your criteria (keyword match, LLM classification, etc.)
Serves a feed endpoint that BlueSky clients can subscribe to
Users add your feed as a custom timeline

Resources:

Official starter kit: github.com/bluesky-social/feed-generator
Python implementation: github.com/MarshalX/bluesky-feed-generator
Example: Indie Tech Feed filters for open-source, game dev, hacking content

Attie (launched March 2026): BlueSky's own AI assistant (powered by Claude) that lets non-technical users create custom feeds using natural language. Already the second most blocked account (~125K blocks) — indicating user resistance to AI on the platform.

(Source: BlueSky Custom Feeds, Feed Generator GitHub)

BlueSky Scraping Services

Provider	Pricing	Notes
Apify BlueSky Scraper	$1.50/1K posts or free for 100/day	Extract posts, profiles, engagement metrics
AT-bot (MCP-Native)	Free (CC0 license)	31 MCP tools, AES-256-CBC auth, ~300ms post creation
atproto-scraping	Free	Git scraping of AT Protocol instances
BlueSkySight	Free (PyPI)	Python library, Jetstream integration

1.3 Cross-Platform Tools

Tool	Platforms	Type	Key Feature
XActions	X, BlueSky, Mastodon, Threads	Open-source toolkit + MCP	Unified interface, 140+ MCP tools
Polybot	X, Mastodon, BlueSky	Python framework	Cross-platform posting, auto message length
Apify	X, BlueSky, Instagram, TikTok, etc.	Commercial	Pre-built actors for each platform
n8n	Multi-platform via integrations	Workflow builder	515+ social media templates

2. Existing Tools and Agents

2.1 Commercial Reply-Automation Platforms

ReplyGuy (replyguy.com) — The Category Leader

The most polished commercial product for automated social media replies.

How it works:

You define keywords relevant to your product
ReplyGuy scours the web for matching conversations
AI selects high-quality, recent, relevant posts
Generates replies that "genuinely help the original poster while mentioning your product"
Twitter: Fully automated posting (when enabled) or manual
Reddit & LinkedIn: Semi-manual — system identifies + generates, you copy-paste-publish

Platforms: Twitter (auto-reply), Reddit (semi-manual), LinkedIn (semi-manual) Pricing: Subscription-based (details behind paywall) Claims: Saves 30-60 hours/month per project

(Source: ReplyGuy, How It Works)

Risk warning: ReplyGuy's Twitter auto-reply feature directly violates X's ToS which requires "prior written and explicit approval" for AI reply bots. Using it risks account suspension.

Other Commercial Tools

Tool	Platforms	Model	Approach	Notable
TweetHunter	X/Twitter	Proprietary	SaaS	$10M+ exit. Common complaint: robotic AI content
TrendRadar	X/Twitter	AI-powered	SaaS + browser	"Reply guy" growth strategy automation
Hootsuite	Multi-platform	Various	Enterprise SaaS	AI outperforms humans for bottom-of-funnel CTAs
Ayrshare	Multi-platform	API	Developer API	Programmatic posting + reply to comments
Marblism "Sonny"	Multi-platform	AI	Autonomous agent	3-4 daily posts with adaptive tactics
Manus AI	Multi-platform	AI	Autonomous agent	Campaign promotion, content optimization, $39-200/mo

2.2 Browser Extensions (Reply-in-Context)

These inject AI reply generation directly into the social platform's UI. The user sees a post, triggers the extension, reviews the reply, and posts manually.

Extension	Platforms	LLMs	Key Feature	Source
Qura AI	X, LinkedIn, Reddit, FB	GPT-4o, Claude, Gemini	Fine-tuned on millions of tweets, 19+ tone presets	qura.ai
TweetStorm.ai	X/Twitter	Proprietary	Keyword forcing, emoji/hashtag toggles, content history	tweetstorm.ai
XReplyGPT	X/Twitter	OpenAI API	Open-source, never auto-sends	GitHub
twitter-ai-reply	X/Twitter	OpenAI API	Vue.js, tone selection, edit-before-post	GitHub
Smart AI Reply	X, LinkedIn	Proprietary	One-click contextual reply generation	smart-ai-reply.com
GM Bot	X/Twitter	Built-in	Auto scrolls, replies, likes, follows based on settings	Chrome Web Store

2.3 Open-Source Automation Frameworks

twitter-automation-ai (Most Comprehensive)

GitHub: github.com/ihuzaifashoukat/twitter-automation-ai
Stack: Python, Selenium, undetected-chromedriver, selenium-stealth
LLMs: OpenAI, Azure OpenAI, Gemini (via LangChain)
Key features: Multi-account management, keyword-based reply automation with recency filters, LLM relevance scoring (0-1 scale), competitor interaction, sentiment analysis, proxy pool rotation, per-account metrics tracking
Configuration: config/settings.json (global) + config/accounts.json (per-account overrides with keywords, LLM preferences, proxy settings)

ElizaOS + client-twitter

GitHub: github.com/elizaos-plugins/client-twitter
Framework: ElizaOS — TypeScript framework for autonomous AI agents
Innovation: Twitter client without API key using browser cookies
Features: Post generation, interaction handling, search, Twitter Spaces, optional Discord approval workflow, character files for agent personality, long-term memory
Community: Massive open-source community (ai16z origin)

socialautonomies

GitHub: github.com/Prem95/socialautonomies
Stack: Next.js 14, TypeScript, Prisma, Supabase auth, Stripe
Architecture: Full SaaS platform with auto-reply, auto-engage, tweet scheduling, analytics dashboard
Twitter client: ElizaOS agent-twitter-client (cookie-based, no API key)

AT-bot (BlueSky MCP-Native)

Source: Automating Bluesky for AI Agents
Architecture: CLI (Bash 4.0+) + MCP Server (TypeScript/Node.js 18+)
31 MCP tools across Authentication, Content, Feed, Profile, Search, Engagement
Performance: Auth ~500ms, post creation ~300ms, <5MB memory, 100+ ops/minute
License: CC0-1.0 (public domain)

2.4 Workflow Orchestration Pipelines

GitHub: github.com/langchain-ai/social-media-agent
The gold standard for monitor-filter-generate-review-post pipelines
Stack: LangGraph, Claude (Anthropic API), FireCrawl, Supabase, TypeScript/React
Architecture: Content sources -> FireCrawl scraping -> Claude relevance evaluation -> marketing report -> platform-specific post generation -> image suggestion -> human review (Agent Inbox UI) -> OAuth posting -> Slack notification
Human-in-the-loop: LangGraph interrupts at decision points. Users approve/modify/reject via Agent Inbox web UI.
Batch mode: Slack channel ingestion with daily cron triggers

n8n Pipelines

515+ social media templates at n8n.io/workflows/categories/social-media/
Self-hosted, native LangChain support (n8n 2.0), human-in-the-loop patterns
Notable templates: "AI-powered news monitoring & social post generator", "Social media sentiment analysis dashboard", "Multi-platform content creation with AI"
Pipeline pattern: Trigger nodes (RSS, webhooks, cron) -> AI processing (summarization, adaptation) -> Quality control (Slack approval) -> Multi-platform publishing -> Feedback loops

2.5 AI Agent Frameworks

Framework	Language	Social Media Relevance
LangGraph	Python/TS	Best for stateful monitor-filter-generate-review pipelines with human-in-the-loop interrupts
CrewAI	Python	Role-based agent teams: "social media manager" + "content writer" + "reviewer"
ElizaOS	TypeScript	Native Twitter/Discord/Telegram clients, personality system, long-term memory
AutoGen	Python	Multi-agent debate on reply quality before posting
LangChain	Python/TS	Foundation layer, 1000+ integrations

3. X Algorithm and Engagement Mechanics (2026)

Understanding the algorithm is critical for any reply strategy. In January 2026, xAI released a Grok-powered transformer model replacing the legacy system.

Engagement Weight Hierarchy

Action	Algorithmic Weight (relative to like)	Notes
Reply	~15x	Most valuable single action
Reply + author reply back	~150x	Conversation = massive distribution boost
Retweet	20x
Profile click	12x
Link click	11x
Bookmark	10x
Like	1x (baseline)	Weakest signal

Thread compounding: A thread with 5+ back-and-forth replies receives 3-4x the impressions of a tweet with 5 standalone likes. Author response to replies triggers 2.5x more out-of-network reach.

Premium boost: Premium subscribers receive ~10x more impressions (4x in-network, 2x out-of-network).

(Source: Reply Guy Framework, PostEverywhere, OpenTweet)

Time Decay

Critical window: First 30 minutes determines distribution trajectory
Half-life: 18-43 minutes
95% of distribution occurs within 24 hours
Velocity test: 50 engagements in 1 hour = massive distribution; 50 over 24 hours = buried
Consistency signal: Missing 3+ consecutive activity days triggers algorithmic throttling

What This Means for Reply Strategy

Replies are the single most heavily weighted engagement signal. One genuine reply chain where the author engages back is worth more than hundreds of likes. Consistent high-quality replies build your account reputation score, meaning your original posts start with better distribution. The flywheel: replies -> reputation -> better distribution on original content.

4. Architecture Pattern: Monitor -> Filter -> Generate -> Review -> Post

┌─────────────────────────────────────────────────────────────────┐
│  1. MONITOR                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐   │
│  │ Keywords  │  │ Mentions │  │ Competitor│  │ Jetstream/   │   │
│  │ Tracking  │  │ Listener │  │ Scraper   │  │ Firehose     │   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └──────┬───────┘   │
│       └──────────────┴─────────────┴───────────────┘            │
│                          ↓                                      │
│  2. FILTER                                                      │
│  ┌─────────────────────────────────────────────────────┐        │
│  │ Relevance scoring (LLM-based, 0-1 threshold)       │        │
│  │ Sentiment analysis (positive/negative/question)     │        │
│  │ Deduplication (Redis/DB state tracking)             │        │
│  │ Recency filter (configurable time window)           │        │
│  │ Engagement threshold (min likes/retweets)           │        │
│  │ Author authority filter (follower count, blue check)│        │
│  └─────────────────────┬───────────────────────────────┘        │
│                        ↓                                        │
│  3. GENERATE                                                    │
│  ┌─────────────────────────────────────────────────────┐        │
│  │ LLM reply generation with:                          │        │
│  │  - Brand voice / tone guidelines                    │        │
│  │  - Character limits (280 Twitter / 300 BlueSky)     │        │
│  │  - Context window (original post + thread)          │        │
│  │  - Few-shot examples of ideal replies               │        │
│  │  - Product mention rules (when/how to reference)    │        │
│  │  - Structured JSON output for metadata              │        │
│  └─────────────────────┬───────────────────────────────┘        │
│                        ↓                                        │
│  4. REVIEW (Human-in-the-Loop)                                  │
│  ┌─────────────────────────────────────────────────────┐        │
│  │ Options:                                            │        │
│  │  a) Slack/Discord notification with approve/reject  │        │
│  │  b) Google Sheet staging (original + draft pairs)   │        │
│  │  c) Web UI dashboard (LangGraph Agent Inbox)        │        │
│  │  d) Email digest with one-click approval            │        │
│  │ Conditional: auto-approve high-confidence, flag low │        │
│  └─────────────────────┬───────────────────────────────┘        │
│                        ↓                                        │
│  5. POST                                                        │
│  ┌─────────────────────────────────────────────────────┐        │
│  │ Platform API posting (Tweepy/API, AT Protocol)      │        │
│  │ Rate limiting and natural timing jitter              │        │
│  │ Screenshot/proof capture                            │        │
│  │ Analytics logging (Airtable, JSON, DB)              │        │
│  │ Feedback loop → refine future scoring               │        │
│  └─────────────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────┘

Implementation Approaches by Complexity

Complexity	Stack	Best For	Cost
Low	Chrome extension (Qura/TweetStorm)	Solo creators, manual reply-by-reply	Free-$20/mo
Medium	n8n/Zapier + OpenAI + Slack approval	Small teams, scheduled content	$20-100/mo
High	LangGraph + custom agents + Supabase	Brands needing full pipeline control	Dev time + API costs
Maximum	twitter-automation-ai + multi-account	Growth hackers (high platform risk)	Dev time + account risk

Platform-Specific Pipeline Considerations

For X/Twitter monitoring:

Best approach: Official API (pay-per-use at ~$0.01/tweet) or x-twitter-scraper ($0.00015/call)
Search endpoint for keyword monitoring
Streaming not available below Pro tier ($5K/mo)
Cookie-based scrapers (twscrape, Scweet) for budget-constrained monitoring

For BlueSky monitoring:

Best approach: Jetstream WebSocket (free, real-time, ~850 MB/day)
Connect to wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post
Filter client-side by keyword matching on post text
Or build a feed generator that indexes matching posts server-side
Public API search endpoint (free, requires auth)

5. BlueSky vs X Comparison: Scraper-Friendliness

Dimension	X/Twitter	BlueSky
API cost for reads	$200/mo minimum (Basic) or ~$0.01/tweet (pay-per-use)	Free (AT Protocol public API)
Real-time stream	Pro tier ($5K/mo) or unofficial scrapers	Free (Jetstream WebSocket, 4 public instances)
Auth for public reads	Required (API key or cookies)	Not required for profiles/posts (search needs auth)
Rate limits	Aggressive (varies by tier)	Generous (5K points/hr, 35K/day)
Bot policy	Must label, engagement automation banned	Must label, opt-in interaction only
Scraping stance	Explicitly banned in ToS since Sept 2023	Open protocol, encouraged
Data format	Proprietary GraphQL (reverse-engineered)	Open AT Protocol (documented, stable)
Community tools	Many, but all in gray area	Growing, all legitimate
User base	~600M+ accounts	~30M+ accounts
Developer audience	Mixed	High concentration of developers and tech community
Feed generators	No equivalent	Custom algorithmic feeds anyone can build
MCP integration	XActions (140+ tools), x-twitter-scraper	AT-bot (31 tools), growing
Legal risk (EU)	High (scraping = GDPR violation per Dutch DPA)	Lower (open protocol, but GDPR still applies to personal data)

Verdict: BlueSky is dramatically more scraper-friendly. The AT Protocol's openness, free Jetstream firehose, and explicit bot support make it the clear choice for automated monitoring. X has a larger audience but higher cost, legal risk, and platform risk. The trade-off is reach (X) vs. accessibility (BlueSky).

The browser agent market is exploding ($4.5B in 2024, projected $76.8B by 2034):

Agent	Type	Social Media Capability	Pricing
Browser Use	Open-source Python	Mass posting, follower engagement, account tasks	Free
Skyvern	AI + computer vision	LinkedIn bulk actions, CAPTCHA handling	Freemium
Axiom.ai	No-code Chrome extension	Bulk uploads, data scraping, GPT-drafted replies	Freemium
PhantomBuster	Cloud automation	LinkedIn/Twitter/Instagram bots, auto-following	Credit-based
Browserbase + Stagehand	Cloud + open SDK	Enterprise LinkedIn at scale, session persistence	Usage-based
Vercel Agent Browser	Headless CLI	General browser automation, 12.1K GitHub stars	Free

Browser Use (github.com/browser-use/browser-use): 89.1% success rate on WebVoyager benchmark. Open-source Python framework that gives any LLM (GPT-4, Claude, local models) browser control. Self-hosted, customizable, no vendor lock-in.

7. Case Studies and Reported Results

ReplyGuy Users

Saves 30-60 hours/month per project
Fully automated Twitter replies + semi-manual Reddit/LinkedIn

Maybe AI Users

1-2 hours/day reclaimed from manual reply work
3x increase in comments posted
More natural-sounding than fully manual (counterintuitive)

Hootsuite AI Experiment

AI outperforms humans for bottom-of-funnel CTAs
AI underperforms for brand humor, cultural references, current events
Best results: hybrid (AI drafts, human refines)

"Reply Guy" Growth Strategy (Manual)

Documented pattern: find high-engagement tweets -> post valuable replies -> gain impressions -> convert to followers
Best templates: respectful contrarian, data nuggets, operator lens, mini-case studies
10 high-value replies > 50 generic ones
Replies are worth 15-27x more than likes algorithmically

TweetHunter/Taplio Exit ($10M+)

Built 2021, sold 2022 for 8 figures
Most common complaint: AI content lacks authenticity, requires extensive editing
Lesson: market is proven, but quality remains the bottleneck

8. Academic Research

Paper	Focus	URL
"Can LLMs Simulate Social Media Engagement?"	Action-guided response generation	arxiv.org/html/2502.12073v1
"SoMe: Realistic Benchmark for LLM-based Social Media Agents"	Evaluating AI agent social media behavior	arxiv.org/html/2512.14720v1
"@grokSet: Multi-party Human-LLM Interactions"	Human-LLM interaction in real social media	arxiv.org/html/2602.21236

Implications for Kendo

BlueSky is the low-risk monitoring opportunity. Jetstream provides free, real-time, legally defensible access to all public posts. A keyword monitor for developer tool discussions is technically trivial to build and doesn't violate any ToS or GDPR rules (as long as you don't store personal data beyond what's needed).
X monitoring is expensive or legally risky. The legitimate path is $200/mo API or ~$0.01/tweet pay-per-use. Cookie-based scrapers work but violate ToS and create GDPR exposure for a Dutch company.
Human-in-the-loop is non-negotiable. Every successful implementation uses human review before posting. Fully automated replies violate X's ToS, risk BlueSky community backlash, and trigger EU AI Act transparency obligations from August 2026.
The LangGraph social-media-agent is the reference architecture. If Kendo ever builds a content pipeline, this is the pattern: LangGraph state machine with interrupt-based human review, multi-platform posting, and observability.
For Kendo's current stage, manual engagement is the right strategy. AI drafting + human review + manual posting is the sweet spot — zero legal risk, zero platform risk, and the algorithm rewards genuine conversation over volume.
MCP-native tools are the 2026 trend. XActions (140+ tools), x-twitter-scraper, AT-bot, and Apify's MCP endpoints show that social media data is becoming a first-class data source for AI agents. This aligns with Kendo's MCP-aware architecture.

Open Questions

How reliable are the new MCP-native social media tools (XActions, x-twitter-scraper) in practice? Are they production-stable or demo-ware?
What is the actual suspension rate for accounts using cookie-based scrapers (twscrape, Scweet, ElizaOS) at moderate volumes?
Can a BlueSky Jetstream-based keyword monitor be productized as a Kendo feature (e.g., "social listening" for project-related discussions)?
How will the EU AI Act's Article 50 transparency requirements (August 2026) be enforced in practice for social media bots?
Is BlueSky's developer audience large enough to justify platform-specific monitoring for a dev tool like Kendo?
What approval UX works best for a solo founder? Slack notifications, web dashboard, or spreadsheet staging?

Social Media Scraping Agents: X/Twitter & BlueSky Landscape for Marketing Engagement (2026) ​

Summary ​

1. Technical Approaches to Scraping ​

1.1 X/Twitter Scraping Methods ​

Official API (Updated 2026-04-04) ​

Unofficial API Scraping (GraphQL/Cookie-based) ​

Browser Automation ​

MCP-Native Tools (New Category, 2026) ​

Commercial Scraping APIs ​

1.2 BlueSky Scraping Methods ​

AT Protocol Public API (Free, No Auth for Reads) ​

Jetstream (Real-Time WebSocket Firehose) ​

Feed Generators (Custom Algorithms) ​

BlueSky Scraping Services ​

1.3 Cross-Platform Tools ​

2. Existing Tools and Agents ​

2.1 Commercial Reply-Automation Platforms ​

ReplyGuy (replyguy.com) — The Category Leader ​

Other Commercial Tools ​

2.2 Browser Extensions (Reply-in-Context) ​

2.3 Open-Source Automation Frameworks ​

twitter-automation-ai (Most Comprehensive) ​

ElizaOS + client-twitter ​

socialautonomies ​

AT-bot (BlueSky MCP-Native) ​

2.4 Workflow Orchestration Pipelines ​

LangChain social-media-agent (Reference Implementation) ​

n8n Pipelines ​

2.5 AI Agent Frameworks ​

3. X Algorithm and Engagement Mechanics (2026) ​

Engagement Weight Hierarchy ​

Time Decay ​

What This Means for Reply Strategy ​

4. Architecture Pattern: Monitor -> Filter -> Generate -> Review -> Post ​

Implementation Approaches by Complexity ​

Platform-Specific Pipeline Considerations ​

5. BlueSky vs X Comparison: Scraper-Friendliness ​

6. Browser Automation Agents for Social Media ​

7. Case Studies and Reported Results ​

ReplyGuy Users ​

Maybe AI Users ​

Hootsuite AI Experiment ​

"Reply Guy" Growth Strategy (Manual) ​

TweetHunter/Taplio Exit ($10M+) ​

8. Academic Research ​

Implications for Kendo ​

Open Questions ​