How it works.
One evening recently, I was sitting around a fire with some friends lamenting the fact that Google reviews had become absolute shit. You can’t trust the star rating because 80% of the reviews are complaints about the check taking too long to come or the waiter smelling bad or the manager being under 30. And the whole point of the stars in the first place is to have a quick glance at what the best options are, not sit there sifting through all the stupid stuff that has absolutely 0 impact on the quality of the food.
Reddit, however, is another story. People on Reddit are coming to answer questions in threads, like “I’m only in Denver for 3 months, where should I go?” Or, “what’s a hidden gem in Calgary that the locals love?” THAT is what I’m looking for. So that is what I attempted to make here with my good friend Claude Code (with inspiration from another Reddit review site that I find quite useful: redditrecs.com).
Anyway, here’s a rundown of how it works for anyone curious. If you’re smarter than me about this stuff and see room for improvement, I am ALL ears cause the manual parts of it are a lil tedious (I still have like 200 restaurants in Paris that need to be assigned cuisines and/or websites, oops).
Find the threads.
The pipeline was originally set up to pull Reddit threads from a curated list of subreddits per city based on popularity + keywords (restaurant, food, etc.). Turns out that plan sucked. The juicy food discussions live in threads that aren’t trending, and I don’t have the budget or the patience to scrape the entirety of Reddit’s database.
So I pivoted to a plain text file per city where I handpicked ~25–70 threads with real, relevant discussions about the food in a city and the pipeline read that list instead. Tedious, yes, but the signal density per thread was SO much better and more cost efficient than letting the scraper roam free.
Another note: Reddit’s official API now gates new clients behind a support form approval that apparently takes weeks, so this project scrapes via Apify instead. Benefit: longevity (Apify can scrape back farther in time). Downside: $$$.
Reduce the noise.
Since I’m trying to find both relevant and abundant threads per city (for variety!), some of them weren’t perfect. Some ended up not even really being about food, despite what the OP’s post suggested.
So to filter out the “best restaurants that closed during COVID, RIP” sort of nostalgia or the “my favorite restaurant is actually a grocery store and so is everyone else’s on this thread,” every thread gets a cheap relevance score from Claude Haiku on a 0.0–1.0 scale before extraction. Anything below a 0.4 gets dropped. For the most part, it catches the duds I missed and saves the extraction budget for the threads that actually have value.
Get the reviews.
Every comment in a thread that’s deemed relevant goes to Claude with a tool-use schema. The output is a list of structured evaluations: the restaurant name, an optional neighborhood, separate food and service sentiments, a verbatim quote, and vibe tags from a closed taxonomy (date_night, hidden_gem, special_occasion, etc, depending on the topic of the original post).
Food and service sentiments are intentionally kept apart. So great food and terrible service produce two separate ratings, not an averaged score (the whole impetus of the site, really). A restaurant won’t lose points if a waitress was rude or someone had to wait 45 minutes for a table because the restaurant IS POPULAR.
For comments with just the restaurant name, Claude pulls the context of the rest of the thread. So if a thread is titled “Best date night in Denver?” and someone just responds, “Tavernetta,” that counts as a positive food vote. Neutral search threads (“Where can I find sushi? Any sushi at all.”) don’t imply sentiment either way, so those bare names get logged as buzz only and surface on the card as “+ N more mentions”, separate from the food or service score.
For non-English comments, the model returns both the original verbatim quote and a literal English translation. For now, this only applies to Paris reviews.
Match it to a real place.
“Sushi Den” is a string, so before it can be ranked, it needs a stable identity. This stage conducts a Google Places text search scoped to the city, picks the best candidate, and attaches a confidence score: 0.95 for clean single-result matches, 0.80 for top-ranked multi-result matches, down to 0.45 when several places could plausibly be the one. Anything under a 0.60 lands in an admin queue for me to go through manually.
After coming across several mentions of amusement parks and malls, I also added a rule to reject Google’s non-restaurant venue types. I’m personally not all that interested in the quality of food at Ball Arena.
Score and rank.
Per restaurant, per aspect (food and service), the score is a smoothed positive rate:
score = (positive + 2.0) / (positive + negative + 2.0 + 1.5)
That’s a Beta(α=2, β=1.5) prior, a slight positive lean because Reddit mentions of a restaurant skew toward recommendations to begin with. The prior pulls low-volume restaurants toward a neutral middle so a single rave doesn’t crown a place over one with forty mixed reviews. Mixed sentiments count as half a positive and half a negative.
City rank is sorted by food score with the total unique commenters as the tiebreaker (service score is shown but not taken into account when it comes to the ranking). Restaurants whose only signal is negative are hidden from the public list.
Vibe tags are only given to a restaurant when at least two different commenters describe it that way, usually in a thread that’s asking about that specific vibe (ex: apparently everyone in Paris goes on their anniversary dinner to L’Oiseau Blanc, so it’s marked with a “date_night” vibe tag).
Manual reconciliation.
Low-confidence matches, ambiguous mentions, and the occasional sarcastic thread land in an admin queue. I review them, reassign mentions to the right restaurant, assign cuisines that Google hadn’t labeled, and recompute the city scores when I’m done (that part, thankfully, is not manual). It’s a bottleneck but (for now) a necessary quality assurance measure.
The stack.
- LLMsClaude Haiku 4.5 for relevance and extraction, via the Anthropic SDK with tool-use for structured output.
- DataApify (Reddit scrape) and Google Places (canonical IDs, hours, geocoding).
- AppNext.js 16 App Router, React 19, Tailwind v4, Mapbox GL, deployed on Vercel.
- DBSupabase — Postgres with PostGIS for the map.
- PipelinePython, kept as a separate project from the web app and writing into the same database.
- Built withClaude Code (Opus 4.7) in Cursor.
Limitations.
Reddit comes with its own imperfections, of course. Cities with small or quiet food subreddits will have fewer mentions and noisier rankings. The Bayesian prior helps, but it can’t invent signal that isn’t there.
I’ll have to add cities manually over time. Right now, it’s only four that are relevant to myself and my close group of friends. (Note: you can request a city and it’ll show up on my admin page. If there are more than ~2 or 3 requests, I’ll go ahead and do it.)
I’ll refresh the data annually, maybe, assuming this is a pet project I still find useful. But a once-a-year refresh isn’t ideal, so... oh well.
Bare-name inference can misfire (somewhat frequently, tbh) on sarcastic threads. The extractor is conservative about it, but we all know LLM models don’t understand the subtleties of a wink emoji.
- Kelsey