AI & LLM Engineering Internship - Digital Blinc Education Research and Technology

AI & LLM Engineering Internship

Digital Blinc Education Research and Technology

Description

What You'll Own:

You are responsible for making every model call in the platform as fast and cost-efficient as possible. You benchmark inference providers (Groq, Cerebras, Fireworks), configure speculative decoding, and ensure the Model Router always picks the fastest available free-tier endpoint.

Responsibilities:

Benchmark Groq, Cerebras, Together AI, Fireworks, and OpenRouter across latency, throughput, and accuracy on agentic tasks
Configure Ollama 0.3.x with speculative decoding for local fallback models
Build and maintain the ModelRouter class with automatic rate-limit detection and provider rotation
Profile memory and token usage per worker agent and reduce average cost per task by 40%
Write inference optimization documentation for the team

Requirements:

Python (intermediate) — can read and write async code
Familiarity with REST APIs and JSON
Basic understanding of LLMs (knows what tokens, temperature, and context window mean)
Bonus: Prior exposure to Ollama, LM Studio, or any local model runner

You'll Learn: vLLM, Cerebras WSE-3 API, OpenRouter free routing, speculative decoding, provider failover architecture

Competencies

Problem Solving

Attention to Detail

Teamwork and Collaboration

Proficiency

Benefits

Professional Development

Flexible Hours

Remote Work

Training Provided

Work-Life Balance

Graduate-Friendly

Contact Email

Salary

750 - 1.8K per month

Interview Process

A. Coding + Problem Solving