This Is Where WeBreak Things OnPurpose.
CipherBitzLabsiswherewerunexperimentsthataretooearlyforproducts,toohonestforcasestudies,andtoointerestingtoignore.Somegraduateintoourproducts.Somefailpublicly.Allaredocumented.Nothinghereispolished.
Twelve Experiments.Zero Are Polished.
These are live. Some have clear hypotheses. Some are exploratory. All are documented with what we expected to find and what we have found so far.
LLM Prompt Caching — Response Latency
Caching common AI prompt patterns at the edge will reduce Gemini API call volume by 40%+ without degrading response quality for user queries.
14 days in: 33% cache hit rate achieved. Quality degradation on location-specific queries — investigating.
GSAP ScrollTrigger on iOS Safari
Complex scroll-pinned GSAP animations cause frame drops below 30fps on iPhone 13 and older in Safari — a subset of our user base.
Confirmed: pinned sections with >3 GSAP targets drop to 22fps on iPhone 12 Safari. Mitigation: reduce pinned target count.
n8n Self-Hosted Memory Profiling
n8n's default queue mode on shared hosting consumes 400MB+ at idle. Switching to webhook execution mode reduces idle RAM to under 150MB.
21 days in: webhook mode at 142MB idle. Queue mode at 380MB. Hypothesis confirmed. Documenting workflow constraints.
Lottie vs Rive — Production Tradeoffs
Rive's runtime renders at smaller file sizes than Lottie for equivalent animations — but has a steeper design tooling curve that may outweigh benefits.
3 days in: Rive at 60% of Lottie file size for test animation. Tooling curve is real — Figma→Rive workflow non-trivial.
TanStack Query vs SWR — Cache Behaviour
TanStack Query's cache invalidation offers more granular control than SWR for product-level data, but adds ~12KB to bundle at tree-shaken minimum.
Bundle impact: +9.4KB (not 12KB as modelled). Cache granularity: TanStack clearly superior for optimistic updates.
Cloudflare Workers AI for Edge Inference
Running lightweight classification models at the Cloudflare edge will reduce latency for AI-routed requests by 200ms+ vs full Gemini API calls.
Edge inference latency: 40ms vs 320ms for Gemini. Quality gap on nuanced queries. Good for routing/classification, not answers.
Next.js App Router Static Export
Converting FinCalc to full static export using Next.js App Router will reduce TTFB to under 50ms globally via Cloudflare CDN without sacrificing dynamic calculator behaviour.
TTFB: 38ms globally (target met). Dynamic client hydration stable. Exploring for other content-heavy products.
Rich Snippet Schema → CTR Lift Measurement
FAQ and HowTo schema on FinCalc calculator pages will increase organic CTR by 15%+ versus pages without structured data, within 60 days.
Day 28 of 60: CTR up 9.3% on schema pages vs control. Trending toward hypothesis. Insufficient data to conclude.
Gemini Auto-Tagging for NammaHubballi
Gemini Flash can auto-categorise and tag new business listings for NammaHubballi at 90%+ accuracy versus manual human tagging.
Accuracy on 200 listings: 87.5%. Fails most often on ambiguous multi-category businesses. Human review needed for 12%.
PostgreSQL FTS vs Algolia for MNCJob
PostgreSQL native full-text search with tsvector and GIN index can match Algolia's search quality for MNCJob's job query patterns at zero extra cost.
Simple keyword search: PG FTS comparable. Fuzzy/typo-tolerant search: Algolia wins clearly. Verdict: depends on query mix.
React 19 Compiler — Re-render Reduction
React 19's compiler will eliminate 60%+ of unnecessary re-renders in the NextGirl product catalog component tree without any manual memoisation.
React DevTools profiler: 54% re-render reduction. Below hypothesis but significant. No manual memo needed — clean code win.
Deno Deploy for Webhook Receivers
Deno Deploy's edge functions will handle inbound webhook processing at lower latency and cost than PM2 Node.js process on shared VPS hosting.
2 days in: too early for conclusions. Cold start variance needs more data. Initial results inconsistent — continue.
Opinions Without ExperimentsAre Just Preferences.
Every stack decision, every architecture choice, every technology we recommend to clients was validated here first. CipherBitz Labs is the source of our engineering opinions — not conference talks, not blog posts, not vibes.
Every experiment starts with a hypothesis.
Before a single line of experiment code is written, the hypothesis is documented — what we believe to be true, how we will measure whether it is true, and what success and failure both look like quantitatively. An experiment without a hypothesis is just tinkering.
If you cannot write the hypothesis in two sentences with a measurable outcome, you are not ready to run the experiment.
Failed experiments are documented more carefully than successes.
When an experiment succeeds, it is easy to document — the outcome is obvious and the path is clear. When an experiment fails, the failure reason must be understood and documented precisely — otherwise the same false hypothesis gets run again six months later by someone who wasn't in the room.
Every failed experiment gets a post-mortem document: what we believed, what we found, and what we now know instead.
Every experiment has a defined end date.
Open-ended experiments are research projects without accountability. Every experiment in CipherBitz Labs has a time box — typically 2 to 8 weeks. At the end of the box, the experiment is either graduated, formally closed, or explicitly extended with a new hypothesis and new end date. No zombie experiments that run forever without results.
If an experiment has been 'running' for more than 8 weeks without a result update, it is not an experiment — it is a backlog item.
Experiments run on real products — not sandboxes.
A sandbox experiment proves that something works in a sandbox. We run most of our experiments on the real CipherBitz product portfolio — real users, real traffic, real data, and real constraints. The friction of production is part of the experiment. If it only works in a clean environment, it does not work.
Experiments that cannot be tested on a live product are deprioritised. Real-world constraint is a feature of good experiments.
Six Experiments ThatBecame Real Products.
These started as hypotheses in Labs. They became the features, architectures, and decisions that define our six live products today.
Next.js 15 App Router Migration
Whether App Router's React Server Components improve LCP on data-heavy product pages vs Pages Router with SSR.
LCP improved 34% on FreeBill and NextGirl product pages. Bundle size reduced. Cold start latency within acceptable threshold on cPanel-hosted Node.js.
AI-Powered Outfit Recommendations
Whether Google Gemini Flash could produce contextually relevant outfit combinations from a product catalog based on occasion, season, and style preference.
85% user acceptance rate on AI-suggested outfits. Reduced session time to first add-to-cart by 22%. Gemini Flash sufficient — Pro model not needed.
Hyperlocal City AI — Context Architecture
Whether a Gemini model grounded in curated city-specific data can answer hyperlocal questions more accurately than a general LLM with no local context.
Grounded model: 91% accuracy on local Q&A. Base Gemini: 43%. Context injection approach validated — scalable to multiple cities.
FAQ Schema on Financial Calculators
Whether HowTo and FAQ schema on financial calculator pages produces rich snippets in Google Search and improves CTR.
Rich snippets appearing on 68% of schema-tagged pages. Average CTR lift: 19.2% over 60-day measurement period.
PM2 Cluster Mode — Multi-Core Utilisation
Whether PM2 cluster mode distributes Next.js server load across all VPS cores and reduces P95 response time under simulated concurrent user load.
P95 response time: 220ms (single) vs 94ms (cluster, 4 cores). Worth operational complexity for products >500 concurrent users.
Optimistic Updates for Invoice State
Whether TanStack Query's optimistic update pattern reduces perceived latency on invoice status changes in FreeBill without introducing data inconsistency.
Perceived save latency: 0ms (vs 340ms network round-trip). Zero data inconsistency in 2,400+ invoice operations tested. Pattern documented for team reuse.
Eight Failures.All Documented. None Hidden.
Publishing failures is how we prove the successes are real. Every experiment below ran, failed for a specific reason, and produced a documented learning. We will not build the same thing twice.
Real-Time Collaborative Invoice Editing
WebSocket architecture required persistent server connections — incompatible with our cPanel shared hosting model. Could not implement without migrating to a cloud provider. Cost of migration outweighed the feature value at current user scale.
WebSocket features are blocked by hosting topology, not by engineering skill. Evaluate hosting before designing real-time features.
Voice Search Integration — MNCJob
Web Speech API behaviour inconsistent across Chrome mobile, Safari iOS, and Firefox. In particular, iOS Safari 16.x had 40%+ error rate on job search queries due to continuous recognition session drops.
Web Speech API is not production-ready for critical search flows on mobile. Third-party voice SDKs add cost and dependency risk not justified by demand.
Vercel Serverless — Cold Start Reduction
Cold start reduction techniques (edge middleware, prewarming, reduced bundle size) improved P95 cold start from 1400ms to 900ms — still above our 300ms target for product page loads. Serverless cold starts on complex Next.js pages are an architectural constraint, not an optimisation problem.
For products requiring consistent P95 under 300ms: always-on Node.js server (PM2) outperforms serverless. Serverless is wrong for our use case.
AI Auto-Generated Business Descriptions
Gemini-generated business descriptions for NammaHubballi listings were factually plausible but frequently wrong about hours, specialties, and contact details — because the model hallucinated details not in its context. Publishing AI descriptions without human verification created incorrect listings.
AI-generated local business content requires human verification before publish. The cost of verification eliminated the automation benefit. Reverted to manual descriptions.
What We're DesigningExperiments For Next.
These are not product roadmap items. They are hypotheses that are interesting enough to test — but the experiment design isn't ready yet.
Can Gemini 2.0 Replace Search For AskBLR?
Will Gemini 2.0's expanded context window allow full conversational city search without a traditional search index?
Partial Prerendering — Real-World Benefit
Does Next.js Partial Prerendering produce measurable LCP improvement for e-commerce product catalog pages with mixed static/dynamic content?
AI-Generated FAQ Pages — Indexing Rate
Do AI-generated FAQ pages for long-tail queries get indexed and rank as quickly as manually written equivalents with equivalent schema markup?
pgvector for Semantic Product Search
Can pgvector embeddings in PostgreSQL replace keyword search for NextGirl product discovery with better recall and no added infrastructure cost?
GSAP vs CSS Scroll-Driven Animations
Does the Chrome-native CSS Scroll-Driven Animations API (2024) produce equivalent visual output to GSAP ScrollTrigger at lower JavaScript weight?
Bun Runtime for Next.js on VPS
Does replacing Node.js with Bun as the Next.js runtime on a VPS server reduce memory consumption and improve request throughput meaningfully?
Three Ways to ConnectWith What We're Learning.
Follow the Experiments
Every active experiment is documented publicly here — hypothesis, methodology, findings updated in real time. No email gate. No newsletter subscription. Just open research you can follow as it runs.
- ✓Live experiment status updates
- ✓Documented methodologies to reuse
- ✓Honest failure reports when experiments close
- ✓Findings before they become product decisions
Run Your Experiment Here
If you have a technical hypothesis about AI, frontend performance, or infrastructure that you want to test on real production systems with real data — we are open to structured collaboration on experiments that advance shared knowledge.
- ✓Access to live product data sets (anonymised)
- ✓Real traffic and production constraints
- ✓Joint documentation of findings
- ✓Published results credited to both parties
Apply Our Findings to Your Build
Every graduated experiment in Labs is directly applicable to client product builds. When we recommend a stack choice, a caching strategy, or an AI integration approach — it has been tested here first. You are not getting our opinions. You are getting our results.
- ✓Stack decisions backed by measured outcomes
- ✓Architecture choices proven on production data
- ✓Failed approaches documented — not repeated
- ✓Ongoing research informing every engagement
What Are YouCurious?About?
Tell us what you are trying to understand — a technology decision, a product architecture question, or an AI integration approach you are not sure about. We will tell you whether we have run an experiment on it, what we found, and what we would do differently.