Janus CoderV-8B
8B multimodal code intelligence model. Trained on JANUSCODE-800K (largest multimodal code dataset). Generates code from visual inputs (charts, screenshots, animations).
Key Capabilities
- Visual-to-code translation (charts, screenshots → HTML/code)
- Layout bug fixing from images
- Animation reconstruction (Manim)
- 32K token context support
- Multimodal understanding (text + images + code)
- ChartMimic: 74.20 (beats Qwen2.5VL-7B, InternVL3.5-8B)
- WebCode2,M: 18.28 (best open-weight structural correctness)
- InteractScience: 33.32 (visual metrics leader)
Requirements
Hardware:- GPU: RTX 4090, A100, or H100
- VRAM: 16GB minimum, 24GB+ recommended
- RAM: 16GB (32GB for large contexts)
- Storage: 20GB (SSD recommended)
- Ubuntu 22.04 LTS
- CUDA 12.1+
- Python 3.11
Deploy on Spheron
- Sign up at app.spheron.ai
- Add credits (card/crypto)
- Deploy → RTX 4090 → Region → Ubuntu 22.04 → SSH key → Deploy
ssh -i <private-key-path> root@<your-vm-ip>New to Spheron? Getting Started | SSH Setup
Installation
Setup Environment
sudo apt update && apt install -y software-properties-common curl ca-certificates
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt updateInstall Python 3.11
sudo apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheelCreate Virtual Environment
python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activateInstall PyTorch (CUDA)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudioInstall Dependencies
pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests
pip install -U bitsandbytesCreate Runner Script
Create run_januscoder.py (use nano, vim, or SSH-capable editor like Cursor):
#!/usr/bin/env python3
# JanusCoderV-8B runner (InternVL head)
# Uses AutoModelForImageTextToText + AutoProcessor and supports URL/local images.
import argparse
import io
import sys
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText # <-- key class
MODEL_NAME = "internlm/JanusCoderV-8B"
def load_image_from_url(url: str) -> Image.Image:
response = requests.get(url, timeout=30)
response.raise_for_status()
return Image.open(io.BytesIO(response.content)).convert("RGB")
def load_image_local(path: str) -> Image.Image:
return Image.open(path).convert("RGB")
def main():
parser = argparse.ArgumentParser()
source_group = parser.add_mutually_exclusive_group(required=True)
source_group.add_argument("--image-url", type=str, help="URL of the image to process")
source_group.add_argument("--image-path", type=str, help="Local path to the image file")
parser.add_argument("--task", type=str, default="Please describe the image explicitly.", help="Task description for the model")
parser.add_argument("--max-new-tokens", type=int, default=1024, help="Maximum number of new tokens to generate")
parser.add_argument("--bits8", action="store_true", help="Load model in 8-bit mode (requires bitsandbytes)")
parser.add_argument("--no-bf16", action="store_true", help="Force FP16 inputs instead of BF16")
args = parser.parse_args()
use_bf16 = (not args.no_bf16) and torch.cuda.is_available() and torch.cuda.is_bf16_supported()
input_dtype = torch.bfloat16 if use_bf16 else torch.float16
print(f"torch={torch.__version__} | cuda={torch.cuda.is_available()} | bf16_ok={use_bf16} | dtype={input_dtype}")
print("Loading processor …")
processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)
print("Loading model …")
load_kwargs = dict(device_map="auto", trust_remote_code=True)
if args.bits8:
load_kwargs["load_in_8bit"] = True
else:
load_kwargs["torch_dtype"] = input_dtype # Use torch_dtype for consistency
model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, **load_kwargs).eval()
# Build messages with either URL or PIL image
content = []
if args.image_url:
content.append({"type": "image", "url": args.image_url})
else:
pil_image = load_image_local(args.image_path)
content.append({"type": "image", "image": pil_image})
content.append({"type": "text", "text": args.task})
messages = [{"role": "user", "content": content}]
print("Tokenizing …")
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
# Move input tensors to model device/dtype
device = next(iter(model.parameters())).device
for key, value in list(inputs.items()):
if torch.is_floating_point(value):
inputs[key] = value.to(device, dtype=input_dtype)
else:
inputs[key] = value.to(device)
print("Generating …")
with torch.inference_mode():
output_ids = model.generate(**inputs, max_new_tokens=args.max_new_tokens, do_sample=False, use_cache=True)
prompt_length = inputs["input_ids"].shape[1]
generated_text = processor.decode(output_ids[0, prompt_length:], skip_special_tokens=True)
print("\n" + "=" * 80 + "\nOUTPUT:\n" + "=" * 80)
print(generated_text)
if __name__ == "__main__":
main()Usage Examples
Basic Image Description
python run_januscoder.py \
--image-url https://c7.alamy.com/comp/BHKEPY/woman-running-with-two-rottweilers-canis-lupus-familiaris-in-garden-BHKEPY.jpgGenerate HTML/CSS
python run_januscoder.py \
--image-url https://example.com/mockup.jpg \
--task "Generate responsive HTML+CSS from this mockup."Local Images
python run_januscoder.py \
--image-path /path/to/image.jpg \
--task "Convert this UI mockup into React components."Chart to Code
python run_januscoder.py \
--image-url https://example.com/chart.png \
--task "Generate matplotlib code to recreate this chart."Fix Layout Bugs
python run_januscoder.py \
--image-path screenshot.png \
--task "Identify layout issues and provide corrected CSS."Configuration
Arguments:--image-url|--image-path- Input source (URL or local file)--task- Task description (default: "describe image")--max-new-tokens- Output length (default: 1024)--bits8- 8-bit quantization (saves VRAM)--no-bf16- Force FP16 (compatibility mode)
python run_januscoder.py --image-url URL --bits8python run_januscoder.py --image-url URL --max-new-tokens 4096python run_januscoder.py --image-url URL --no-bf16Performance Optimization
Memory: Use --bits8, lower --max-new-tokens, batch processing
Speed: CUDA config, BF16 on A100/H100, caching enabled
Quality: High-res images, detailed prompts, increase tokens for complex tasks
Use Cases
Web Development: Mockups → HTML/CSS, responsive layouts, layout fixes, React/Vue components
Data Viz: Charts → matplotlib/plotly code, interactive dashboards
Animation: Manim rebuilds, SVG generation, CSS animations
Documentation: Code explanations, visual docs, tutorials, GUI documentation
Troubleshooting
OOM Errors:# Use 8-bit quantization
python run_januscoder.py --image-url URL --bits8
# Reduce tokens
python run_januscoder.py --image-url URL --max-new-tokens 512
# Use FP16
python run_januscoder.py --image-url URL --no-bf16export HF_HOME=/path/to/large/storage
export TRANSFORMERS_CACHE=/path/to/large/storage
python run_januscoder.py --image-url URL# Verify CUDA
python -c "import torch; print(torch.cuda.is_available())"
nvidia-smi
# Reinstall PyTorch
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio# Download URL locally
wget https://example.com/image.jpg
python run_januscoder.py --image-path image.jpg --task "Your task"
# Fix permissions
chmod 644 /path/to/image.jpgBest Practices
Prompts: Be specific, include format (HTML/CSS, Python), specify framework (React/Vue)
Images: High-res, clear, well-lit, cropped, standard formats (JPEG/PNG)
Output: Save to files, review before use, iterate prompts, track patterns
Resources: Monitor nvidia-smi, close unused processes, use quantization, batch tasks
Integration Example
Python Wrapper:import subprocess
def generate_code_from_image(image_path, task):
cmd = ["python", "run_januscoder.py", "--image-path", image_path, "--task", task, "--max-new-tokens", "2048"]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout
code = generate_code_from_image("mockup.png", "Generate React components")from flask import Flask, request, jsonify
import subprocess
app = Flask(__name__)
@app.route('/generate', methods=['POST'])
def generate():
data = request.json
cmd = ["python", "run_januscoder.py", "--image-url", data['image_url'], "--task", data.get('task', 'Describe')]
result = subprocess.run(cmd, capture_output=True, text=True)
return jsonify({"code": result.stdout})
app.run(port=5000)Performance (Spheron)
RTX 4090: Simple: 5s/12GB | HTML: 10s/14GB | Complex (2K): 20s/16GB | Full (4K): 40s/18GB
A100: Simple: 3s/12GB | HTML: 6s/14GB | Complex (2K): 12s/16GB | Full (4K): 24s/18GB
Supported Formats
Web: HTML/CSS, JavaScript, React, Vue, Tailwind, Bootstrap
Data Viz: Python (matplotlib, plotly), JS (D3, Chart.js), R (ggplot2)
Animation: Manim, CSS, JavaScript, SVG
Other: SVG, LaTeX, Processing, Three.js
Additional Resources
- Model on HuggingFace
- GitHub Repository
- Getting Started - Spheron deployment
- API Reference - Programmatic access