HomePlatformProductsLabsBuildCompanyContactStart a Project →
Labs Overview
ACTIVE EXPERIMENTS — BUILD IN PUBLIC

This Is Where We
Break Things On
Purpose.

CipherBitzLabsiswherewerunexperimentsthataretooearlyforproducts,toohonestforcasestudies,andtoointerestingtoignore.Somegraduateintoourproducts.Somefailpublicly.Allaredocumented.Nothinghereispolished.

12
Active experiments right now
6
Graduated to live products
8
Failed — documented honestly
2016
Experimenting continuously since
CipherBitz Labs — Experiment Tracker26 total experiments · 12 active
Updated: just now
RUNNING12
LLM Prompt Caching Test ACTIVE
AI · GeminiRunning 14d
GSAP Scroll Performance on iOS ACTIVE
Frontend · GSAPRunning 6d
n8n Self-Hosted RAM Optimisation ACTIVE
InfrastructureRunning 21d
Lottie vs Rive — File Size Comparison ACTIVE
Motion · UIRunning 3d
GRADUATED6
Next.js 15 App Router Migration✓ SHIPPED
FrameworkCompleted 6wk
→ Shipped to NextGirl, FreeBill
AI Product Recommendations (NextGirl)✓ SHIPPED
AI · E-commerceCompleted 12wk
→ Live on nextgirl.in
City AI Chatbot Architecture✓ SHIPPED
AI · LocalCompleted 8wk
→ Deployed: AskBLR, NammaHubballi
FAILED8
Real-time Collaboration in FreeBill✗ CLOSED
SaaS · WebSocketClosed after 4wk
Too complex for current user needs
Voice Search for MNCJob✗ CLOSED
AI · VoiceClosed after 3wk
Browser API inconsistency on mobile
Serverless Cold Start Optimisation✗ CLOSED
InfrastructureClosed after 6wk
Hosting constraints — wrong approach
⚡ LIVE:
20:09:25LLM Prompt Caching — new data point added
CURRENTLY RUNNING

Twelve Experiments.Zero Are Polished.

These are live. Some have clear hypotheses. Some are exploratory. All are documented with what we expected to find and what we have found so far.

AI
ACTIVE

LLM Prompt Caching — Response Latency

HYPOTHESIS

Caching common AI prompt patterns at the edge will reduce Gemini API call volume by 40%+ without degrading response quality for user queries.

FINDING SO FAR

14 days in: 33% cache hit rate achieved. Quality degradation on location-specific queries — investigating.

Running 14d
AIGemini
FRONTEND
ACTIVE

GSAP ScrollTrigger on iOS Safari

HYPOTHESIS

Complex scroll-pinned GSAP animations cause frame drops below 30fps on iPhone 13 and older in Safari — a subset of our user base.

FINDING SO FAR

Confirmed: pinned sections with >3 GSAP targets drop to 22fps on iPhone 12 Safari. Mitigation: reduce pinned target count.

Running 6d
GSAPiOS
INFRA
ACTIVE

n8n Self-Hosted Memory Profiling

HYPOTHESIS

n8n's default queue mode on shared hosting consumes 400MB+ at idle. Switching to webhook execution mode reduces idle RAM to under 150MB.

FINDING SO FAR

21 days in: webhook mode at 142MB idle. Queue mode at 380MB. Hypothesis confirmed. Documenting workflow constraints.

Running 21d
n8nRAM
MOTION
ACTIVE

Lottie vs Rive — Production Tradeoffs

HYPOTHESIS

Rive's runtime renders at smaller file sizes than Lottie for equivalent animations — but has a steeper design tooling curve that may outweigh benefits.

FINDING SO FAR

3 days in: Rive at 60% of Lottie file size for test animation. Tooling curve is real — Figma→Rive workflow non-trivial.

Running 3d
LottieRive
FRONTEND
ACTIVE

TanStack Query vs SWR — Cache Behaviour

HYPOTHESIS

TanStack Query's cache invalidation offers more granular control than SWR for product-level data, but adds ~12KB to bundle at tree-shaken minimum.

FINDING SO FAR

Bundle impact: +9.4KB (not 12KB as modelled). Cache granularity: TanStack clearly superior for optimistic updates.

Running 9d
ReactData
AI · INFRA
ACTIVE

Cloudflare Workers AI for Edge Inference

HYPOTHESIS

Running lightweight classification models at the Cloudflare edge will reduce latency for AI-routed requests by 200ms+ vs full Gemini API calls.

FINDING SO FAR

Edge inference latency: 40ms vs 320ms for Gemini. Quality gap on nuanced queries. Good for routing/classification, not answers.

Running 18d
EdgeCF
FRAMEWORK
ACTIVE

Next.js App Router Static Export

HYPOTHESIS

Converting FinCalc to full static export using Next.js App Router will reduce TTFB to under 50ms globally via Cloudflare CDN without sacrificing dynamic calculator behaviour.

FINDING SO FAR

TTFB: 38ms globally (target met). Dynamic client hydration stable. Exploring for other content-heavy products.

Running 11d
Next.jsCDN
SEO
ACTIVE

Rich Snippet Schema → CTR Lift Measurement

HYPOTHESIS

FAQ and HowTo schema on FinCalc calculator pages will increase organic CTR by 15%+ versus pages without structured data, within 60 days.

FINDING SO FAR

Day 28 of 60: CTR up 9.3% on schema pages vs control. Trending toward hypothesis. Insufficient data to conclude.

Running 28d
SchemaCTR
AI · CONTENT
ACTIVE

Gemini Auto-Tagging for NammaHubballi

HYPOTHESIS

Gemini Flash can auto-categorise and tag new business listings for NammaHubballi at 90%+ accuracy versus manual human tagging.

FINDING SO FAR

Accuracy on 200 listings: 87.5%. Fails most often on ambiguous multi-category businesses. Human review needed for 12%.

Running 7d
GeminiNLP
DATABASE
ACTIVE

PostgreSQL FTS vs Algolia for MNCJob

HYPOTHESIS

PostgreSQL native full-text search with tsvector and GIN index can match Algolia's search quality for MNCJob's job query patterns at zero extra cost.

FINDING SO FAR

Simple keyword search: PG FTS comparable. Fuzzy/typo-tolerant search: Algolia wins clearly. Verdict: depends on query mix.

Running 16d
PostgreSQLSearch
FRONTEND
ACTIVE

React 19 Compiler — Re-render Reduction

HYPOTHESIS

React 19's compiler will eliminate 60%+ of unnecessary re-renders in the NextGirl product catalog component tree without any manual memoisation.

FINDING SO FAR

React DevTools profiler: 54% re-render reduction. Below hypothesis but significant. No manual memo needed — clean code win.

Running 5d
React 19Perf
INFRA
ACTIVE

Deno Deploy for Webhook Receivers

HYPOTHESIS

Deno Deploy's edge functions will handle inbound webhook processing at lower latency and cost than PM2 Node.js process on shared VPS hosting.

FINDING SO FAR

2 days in: too early for conclusions. Cold start variance needs more data. Initial results inconsistent — continue.

Running 2d
DenoWebhooks
WHY WE EXPERIMENT

Opinions Without ExperimentsAre Just Preferences.

Every stack decision, every architecture choice, every technology we recommend to clients was validated here first. CipherBitz Labs is the source of our engineering opinions — not conference talks, not blog posts, not vibes.

P.01

Every experiment starts with a hypothesis.

Before a single line of experiment code is written, the hypothesis is documented — what we believe to be true, how we will measure whether it is true, and what success and failure both look like quantitatively. An experiment without a hypothesis is just tinkering.

OUR RULE

If you cannot write the hypothesis in two sentences with a measurable outcome, you are not ready to run the experiment.

P.02

Failed experiments are documented more carefully than successes.

When an experiment succeeds, it is easy to document — the outcome is obvious and the path is clear. When an experiment fails, the failure reason must be understood and documented precisely — otherwise the same false hypothesis gets run again six months later by someone who wasn't in the room.

OUR RULE

Every failed experiment gets a post-mortem document: what we believed, what we found, and what we now know instead.

P.03

Every experiment has a defined end date.

Open-ended experiments are research projects without accountability. Every experiment in CipherBitz Labs has a time box — typically 2 to 8 weeks. At the end of the box, the experiment is either graduated, formally closed, or explicitly extended with a new hypothesis and new end date. No zombie experiments that run forever without results.

OUR RULE

If an experiment has been 'running' for more than 8 weeks without a result update, it is not an experiment — it is a backlog item.

P.04

Experiments run on real products — not sandboxes.

A sandbox experiment proves that something works in a sandbox. We run most of our experiments on the real CipherBitz product portfolio — real users, real traffic, real data, and real constraints. The friction of production is part of the experiment. If it only works in a clean environment, it does not work.

OUR RULE

Experiments that cannot be tested on a live product are deprioritised. Real-world constraint is a feature of good experiments.

GRADUATED TO PRODUCTION

Six Experiments ThatBecame Real Products.

These started as hypotheses in Labs. They became the features, architectures, and decisions that define our six live products today.

GRADUATED SHIPPED

Next.js 15 App Router Migration

TESTED:

Whether App Router's React Server Components improve LCP on data-heavy product pages vs Pages Router with SSR.

PROVED:

LCP improved 34% on FreeBill and NextGirl product pages. Bundle size reduced. Cold start latency within acceptable threshold on cPanel-hosted Node.js.

GRADUATED SHIPPED

AI-Powered Outfit Recommendations

TESTED:

Whether Google Gemini Flash could produce contextually relevant outfit combinations from a product catalog based on occasion, season, and style preference.

PROVED:

85% user acceptance rate on AI-suggested outfits. Reduced session time to first add-to-cart by 22%. Gemini Flash sufficient — Pro model not needed.

GRADUATED SHIPPED

Hyperlocal City AI — Context Architecture

TESTED:

Whether a Gemini model grounded in curated city-specific data can answer hyperlocal questions more accurately than a general LLM with no local context.

PROVED:

Grounded model: 91% accuracy on local Q&A. Base Gemini: 43%. Context injection approach validated — scalable to multiple cities.

GRADUATED SHIPPED

FAQ Schema on Financial Calculators

TESTED:

Whether HowTo and FAQ schema on financial calculator pages produces rich snippets in Google Search and improves CTR.

PROVED:

Rich snippets appearing on 68% of schema-tagged pages. Average CTR lift: 19.2% over 60-day measurement period.

GRADUATED SHIPPED

PM2 Cluster Mode — Multi-Core Utilisation

TESTED:

Whether PM2 cluster mode distributes Next.js server load across all VPS cores and reduces P95 response time under simulated concurrent user load.

PROVED:

P95 response time: 220ms (single) vs 94ms (cluster, 4 cores). Worth operational complexity for products >500 concurrent users.

GRADUATED SHIPPED

Optimistic Updates for Invoice State

TESTED:

Whether TanStack Query's optimistic update pattern reduces perceived latency on invoice status changes in FreeBill without introducing data inconsistency.

PROVED:

Perceived save latency: 0ms (vs 340ms network round-trip). Zero data inconsistency in 2,400+ invoice operations tested. Pattern documented for team reuse.

FAILED EXPERIMENTS

Eight Failures.All Documented. None Hidden.

Publishing failures is how we prove the successes are real. Every experiment below ran, failed for a specific reason, and produced a documented learning. We will not build the same thing twice.

✗ CLOSED

Real-Time Collaborative Invoice Editing

4 weeks
WHY IT FAILED

WebSocket architecture required persistent server connections — incompatible with our cPanel shared hosting model. Could not implement without migrating to a cloud provider. Cost of migration outweighed the feature value at current user scale.

WHAT WE NOW KNOW

WebSocket features are blocked by hosting topology, not by engineering skill. Evaluate hosting before designing real-time features.

✗ CLOSED

Voice Search Integration — MNCJob

3 weeks
WHY IT FAILED

Web Speech API behaviour inconsistent across Chrome mobile, Safari iOS, and Firefox. In particular, iOS Safari 16.x had 40%+ error rate on job search queries due to continuous recognition session drops.

WHAT WE NOW KNOW

Web Speech API is not production-ready for critical search flows on mobile. Third-party voice SDKs add cost and dependency risk not justified by demand.

✗ CLOSED

Vercel Serverless — Cold Start Reduction

6 weeks
WHY IT FAILED

Cold start reduction techniques (edge middleware, prewarming, reduced bundle size) improved P95 cold start from 1400ms to 900ms — still above our 300ms target for product page loads. Serverless cold starts on complex Next.js pages are an architectural constraint, not an optimisation problem.

WHAT WE NOW KNOW

For products requiring consistent P95 under 300ms: always-on Node.js server (PM2) outperforms serverless. Serverless is wrong for our use case.

✗ CLOSED

AI Auto-Generated Business Descriptions

5 weeks
WHY IT FAILED

Gemini-generated business descriptions for NammaHubballi listings were factually plausible but frequently wrong about hours, specialties, and contact details — because the model hallucinated details not in its context. Publishing AI descriptions without human verification created incorrect listings.

WHAT WE NOW KNOW

AI-generated local business content requires human verification before publish. The cost of verification eliminated the automation benefit. Reverted to manual descriptions.

GraphQL for Product APIs3 weeks
→ No benefit over REST for single-product API
IndexedDB Offline Cache for FreeBill2 weeks
→ Sync complexity exceeded feature value
Stripe Payment on NextGirl (India)2 weeks
→ Razorpay required by Indian payment regulations
CSS Houdini Paint Worklet Backgrounds1 week
→ Zero iOS Safari support — unusable
QUEUED FOR LABS

What We're DesigningExperiments For Next.

These are not product roadmap items. They are hypotheses that are interesting enough to test — but the experiment design isn't ready yet.

QUEUEDAI

Can Gemini 2.0 Replace Search For AskBLR?

THE QUESTION

Will Gemini 2.0's expanded context window allow full conversational city search without a traditional search index?

QUEUEDPERFORMANCE

Partial Prerendering — Real-World Benefit

THE QUESTION

Does Next.js Partial Prerendering produce measurable LCP improvement for e-commerce product catalog pages with mixed static/dynamic content?

QUEUEDAI · SEO

AI-Generated FAQ Pages — Indexing Rate

THE QUESTION

Do AI-generated FAQ pages for long-tail queries get indexed and rank as quickly as manually written equivalents with equivalent schema markup?

QUEUEDDATABASE

pgvector for Semantic Product Search

THE QUESTION

Can pgvector embeddings in PostgreSQL replace keyword search for NextGirl product discovery with better recall and no added infrastructure cost?

QUEUEDMOTION

GSAP vs CSS Scroll-Driven Animations

THE QUESTION

Does the Chrome-native CSS Scroll-Driven Animations API (2024) produce equivalent visual output to GSAP ScrollTrigger at lower JavaScript weight?

QUEUEDINFRA

Bun Runtime for Next.js on VPS

THE QUESTION

Does replacing Node.js with Bun as the Next.js runtime on a VPS server reduce memory consumption and improve request throughput meaningfully?

HOW TO ENGAGE WITH LABS

Three Ways to ConnectWith What We're Learning.

FREE · OPEN

Follow the Experiments

Every active experiment is documented publicly here — hypothesis, methodology, findings updated in real time. No email gate. No newsletter subscription. Just open research you can follow as it runs.

  • Live experiment status updates
  • Documented methodologies to reuse
  • Honest failure reports when experiments close
  • Findings before they become product decisions
FOR BUILDERS

Run Your Experiment Here

If you have a technical hypothesis about AI, frontend performance, or infrastructure that you want to test on real production systems with real data — we are open to structured collaboration on experiments that advance shared knowledge.

  • Access to live product data sets (anonymised)
  • Real traffic and production constraints
  • Joint documentation of findings
  • Published results credited to both parties
FOR CLIENTS

Apply Our Findings to Your Build

Every graduated experiment in Labs is directly applicable to client product builds. When we recommend a stack choice, a caching strategy, or an AI integration approach — it has been tested here first. You are not getting our opinions. You are getting our results.

  • Stack decisions backed by measured outcomes
  • Architecture choices proven on production data
  • Failed approaches documented — not repeated
  • Ongoing research informing every engagement
⟶ The experiments are live. The findings are public.

What Are YouCurious?About?

Tell us what you are trying to understand — a technology decision, a product architecture question, or an AI integration approach you are not sure about. We will tell you whether we have run an experiment on it, what we found, and what we would do differently.