2026
Nemotron Competition
Fine-tuning and prompt engineering pipeline for the Nvidia Nemotron competition. Benchmarks Claude and vLLM backends across six problem categories, with a Streamlit analytics dashboard tracking experiment runs and per-category accuracy.
LLMfine-tuningbenchmarkingpython
2026
Token Efficiency in Self-Improving Agents
Research into reducing token consumption in agents that iteratively rewrite or extend their own prompts and code. Exploring compression strategies, prompt distillation, and selective context retention to make recursive self-improvement economically viable.
agentsLLMself-improvementresearch
2026
Robustness in Long-Running Agents (20+ Hours)
Investigating failure modes that emerge only over extended autonomous runs — context drift, compounding errors, resource leaks, and recovery strategies. Aims to produce a reliability benchmark suite for agents operating continuously beyond 20 hours.
agentsreliabilitybenchmarkingresearch
2026
Identifying and Sourcing Alternative Datasets
Systematic methods for discovering non-standard training and evaluation data — scraping pipelines, synthetic generation, domain-specific crawls, and quality filtering. Focus on filling gaps where canonical benchmarks are saturated or misaligned with real-world tasks.
datasetsdata-engineeringresearch
2026
Large Data Handling Within Context Limits
Techniques for processing datasets that far exceed a model's context window — hierarchical summarisation, retrieval-augmented chunking, streaming state machines, and lossy compression with bounded information loss. Targeting practical patterns for production agent pipelines.
LLMRAGdata-engineeringresearch