
Provide rich context about Kaggle competitions to AI coding assistants
If you've ever tried to use an AI coding assistant for a Kaggle competition, you know the struggle:
- Hundreds of notebooks to sift through
- Context windows that fill up with imports and visualizations
- No easy way to extract the valuable insights
I built KaggleIngest to solve this.
What is KaggleIngest?
It's an open-source tool that:
- Takes any Kaggle competition or dataset URL
- Ranks and downloads the top notebooks
- Extracts valuable patterns (skipping boilerplate)
- Outputs token-optimized context for LLMs
Live Demo: kaggleingest.com
GitHub: github.com/Anand-0037/KaggleIngest
The Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19 + Vite + TanStack Query |
| Backend | FastAPI + Python 3.13 + Redis |
| Deploy | Vercel (frontend) + Render (backend) |
Key Technical Challenges
1. Kaggle SDK Quirks
The official Kaggle SDK has some... interesting behaviors. When credentials are missing, it calls exit(1):
# This crashes your entire app!
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate() # exit(1) if no credentials
My solution: Wrap the import in a try/except that catches SystemExit:
try:
kaggle_service.get_client()
except SystemExit as e:
logger.warning(f"Kaggle auth failed: {e}")
return {"kaggle": False}
2. Smart Notebook Ranking
Not all notebooks are equal. A 5-year-old notebook with 1000 upvotes might be less useful than a recent one with 100.
I use a scoring formula:
score = log(upvotes + 1) * time_decay_factor
Where time_decay_factor decreases for older notebooks.
3. Token Optimization
LLMs are expensive. I used TOON (Token-Optimized Object Notation):
// Standard JSON: 150 tokens
{
"notebook_title": "Introduction to Ensembling",
"notebook_author": "arthurtok",
"upvotes": 3847
}
// TOON: 90 tokens
{"t":"Introduction to Ensembling","a":"arthurtok","v":3847}
That's 40% fewer tokens for the same information.
Try It Yourself
- Go to kaggleingest.com
- Paste a Kaggle URL (try:
https://www.kaggle.com/competitions/titanic) - Download the context file
- Feed it to your favorite LLM
Star on GitHub if this was helpful!
Questions? Drop them in the comments!
Top comments (1)
It's not opensource now. But you can use it for free on kaggleingest.com