Skip to content

Janus CoderV-8B

8B multimodal code intelligence model. Trained on JANUSCODE-800K (largest multimodal code dataset). Generates code from visual inputs (charts, screenshots, animations).

Key Capabilities

  • Visual-to-code translation (charts, screenshots → HTML/code)
  • Layout bug fixing from images
  • Animation reconstruction (Manim)
  • 32K token context support
  • Multimodal understanding (text + images + code)
Benchmarks:
  • ChartMimic: 74.20 (beats Qwen2.5VL-7B, InternVL3.5-8B)
  • WebCode2,M: 18.28 (best open-weight structural correctness)
  • InteractScience: 33.32 (visual metrics leader)

Requirements

Hardware:
  • GPU: RTX 4090, A100, or H100
  • VRAM: 16GB minimum, 24GB+ recommended
  • RAM: 16GB (32GB for large contexts)
  • Storage: 20GB (SSD recommended)
Software:
  • Ubuntu 22.04 LTS
  • CUDA 12.1+
  • Python 3.11
Resources:

Deploy on Spheron

  1. Sign up at app.spheron.ai
  2. Add credits (card/crypto)
  3. DeployRTX 4090 → Region → Ubuntu 22.04 → SSH key → Deploy
Connect:
ssh -i <private-key-path> root@<your-vm-ip>

New to Spheron? Getting Started | SSH Setup

Installation

Setup Environment

sudo apt update && apt install -y software-properties-common curl ca-certificates
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Install Python 3.11

sudo apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Create Virtual Environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Install PyTorch (CUDA)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Install Dependencies

pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests
pip install -U bitsandbytes

Create Runner Script

Create run_januscoder.py (use nano, vim, or SSH-capable editor like Cursor):

#!/usr/bin/env python3
 
# JanusCoderV-8B runner (InternVL head)
# Uses AutoModelForImageTextToText + AutoProcessor and supports URL/local images.
 
import argparse
import io
import sys
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText  # <-- key class
 
MODEL_NAME = "internlm/JanusCoderV-8B"
 
def load_image_from_url(url: str) -> Image.Image:
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    return Image.open(io.BytesIO(response.content)).convert("RGB")
 
def load_image_local(path: str) -> Image.Image:
    return Image.open(path).convert("RGB")
 
def main():
    parser = argparse.ArgumentParser()
    source_group = parser.add_mutually_exclusive_group(required=True)
    source_group.add_argument("--image-url", type=str, help="URL of the image to process")
    source_group.add_argument("--image-path", type=str, help="Local path to the image file")
    parser.add_argument("--task", type=str, default="Please describe the image explicitly.", help="Task description for the model")
    parser.add_argument("--max-new-tokens", type=int, default=1024, help="Maximum number of new tokens to generate")
    parser.add_argument("--bits8", action="store_true", help="Load model in 8-bit mode (requires bitsandbytes)")
    parser.add_argument("--no-bf16", action="store_true", help="Force FP16 inputs instead of BF16")
    args = parser.parse_args()
 
    use_bf16 = (not args.no_bf16) and torch.cuda.is_available() and torch.cuda.is_bf16_supported()
    input_dtype = torch.bfloat16 if use_bf16 else torch.float16
 
    print(f"torch={torch.__version__} | cuda={torch.cuda.is_available()} | bf16_ok={use_bf16} | dtype={input_dtype}")
 
    print("Loading processor …")
    processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)
 
    print("Loading model …")
    load_kwargs = dict(device_map="auto", trust_remote_code=True)
    if args.bits8:
        load_kwargs["load_in_8bit"] = True
    else:
        load_kwargs["torch_dtype"] = input_dtype  # Use torch_dtype for consistency
    model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, **load_kwargs).eval()
 
    # Build messages with either URL or PIL image
    content = []
    if args.image_url:
        content.append({"type": "image", "url": args.image_url})
    else:
        pil_image = load_image_local(args.image_path)
        content.append({"type": "image", "image": pil_image})
    content.append({"type": "text", "text": args.task})
    messages = [{"role": "user", "content": content}]
 
    print("Tokenizing …")
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
 
    # Move input tensors to model device/dtype
    device = next(iter(model.parameters())).device
    for key, value in list(inputs.items()):
        if torch.is_floating_point(value):
            inputs[key] = value.to(device, dtype=input_dtype)
        else:
            inputs[key] = value.to(device)
 
    print("Generating …")
    with torch.inference_mode():
        output_ids = model.generate(**inputs, max_new_tokens=args.max_new_tokens, do_sample=False, use_cache=True)
    prompt_length = inputs["input_ids"].shape[1]
    generated_text = processor.decode(output_ids[0, prompt_length:], skip_special_tokens=True)
 
    print("\n" + "=" * 80 + "\nOUTPUT:\n" + "=" * 80)
    print(generated_text)
 
if __name__ == "__main__":
    main()

Usage Examples

Basic Image Description

python run_januscoder.py \
  --image-url https://c7.alamy.com/comp/BHKEPY/woman-running-with-two-rottweilers-canis-lupus-familiaris-in-garden-BHKEPY.jpg

Generate HTML/CSS

python run_januscoder.py \
  --image-url https://example.com/mockup.jpg \
  --task "Generate responsive HTML+CSS from this mockup."

Local Images

python run_januscoder.py \
  --image-path /path/to/image.jpg \
  --task "Convert this UI mockup into React components."

Chart to Code

python run_januscoder.py \
  --image-url https://example.com/chart.png \
  --task "Generate matplotlib code to recreate this chart."

Fix Layout Bugs

python run_januscoder.py \
  --image-path screenshot.png \
  --task "Identify layout issues and provide corrected CSS."

Configuration

Arguments:
  • --image-url | --image-path - Input source (URL or local file)
  • --task - Task description (default: "describe image")
  • --max-new-tokens - Output length (default: 1024)
  • --bits8 - 8-bit quantization (saves VRAM)
  • --no-bf16 - Force FP16 (compatibility mode)
8-bit Quantization (Low VRAM):
python run_januscoder.py --image-url URL --bits8
Long Output (Complex Tasks):
python run_januscoder.py --image-url URL --max-new-tokens 4096
FP16 Mode (GPU Compatibility):
python run_januscoder.py --image-url URL --no-bf16

Performance Optimization

Memory: Use --bits8, lower --max-new-tokens, batch processing
Speed: CUDA config, BF16 on A100/H100, caching enabled
Quality: High-res images, detailed prompts, increase tokens for complex tasks

Use Cases

Web Development: Mockups → HTML/CSS, responsive layouts, layout fixes, React/Vue components
Data Viz: Charts → matplotlib/plotly code, interactive dashboards
Animation: Manim rebuilds, SVG generation, CSS animations
Documentation: Code explanations, visual docs, tutorials, GUI documentation

Troubleshooting

OOM Errors:
# Use 8-bit quantization
python run_januscoder.py --image-url URL --bits8
 
# Reduce tokens
python run_januscoder.py --image-url URL --max-new-tokens 512
 
# Use FP16
python run_januscoder.py --image-url URL --no-bf16
Model Download Issues:
export HF_HOME=/path/to/large/storage
export TRANSFORMERS_CACHE=/path/to/large/storage
python run_januscoder.py --image-url URL
CUDA Errors:
# Verify CUDA
python -c "import torch; print(torch.cuda.is_available())"
nvidia-smi
 
# Reinstall PyTorch
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
Image Loading:
# Download URL locally
wget https://example.com/image.jpg
python run_januscoder.py --image-path image.jpg --task "Your task"
 
# Fix permissions
chmod 644 /path/to/image.jpg

Best Practices

Prompts: Be specific, include format (HTML/CSS, Python), specify framework (React/Vue)
Images: High-res, clear, well-lit, cropped, standard formats (JPEG/PNG)
Output: Save to files, review before use, iterate prompts, track patterns
Resources: Monitor nvidia-smi, close unused processes, use quantization, batch tasks

Integration Example

Python Wrapper:
import subprocess
 
def generate_code_from_image(image_path, task):
    cmd = ["python", "run_januscoder.py", "--image-path", image_path, "--task", task, "--max-new-tokens", "2048"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout
 
code = generate_code_from_image("mockup.png", "Generate React components")
Flask API:
from flask import Flask, request, jsonify
import subprocess
 
app = Flask(__name__)
 
@app.route('/generate', methods=['POST'])
def generate():
    data = request.json
    cmd = ["python", "run_januscoder.py", "--image-url", data['image_url'], "--task", data.get('task', 'Describe')]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return jsonify({"code": result.stdout})
 
app.run(port=5000)

Performance (Spheron)

RTX 4090: Simple: 5s/12GB | HTML: 10s/14GB | Complex (2K): 20s/16GB | Full (4K): 40s/18GB
A100: Simple: 3s/12GB | HTML: 6s/14GB | Complex (2K): 12s/16GB | Full (4K): 24s/18GB

Supported Formats

Web: HTML/CSS, JavaScript, React, Vue, Tailwind, Bootstrap
Data Viz: Python (matplotlib, plotly), JS (D3, Chart.js), R (ggplot2)
Animation: Manim, CSS, JavaScript, SVG
Other: SVG, LaTeX, Processing, Three.js

Additional Resources