This guide will help you deploy your AI models using decentralized infrastructure. Whether you're a developer or AI enthusiast, you'll learn how to run your models with complete sovereignty - maintaining full control over your AI weights through decentralized storage.
- Key Features
- Before You Start
- Getting Started
- CLI Overview
- Running Models
- Using the API
- Advanced Usage
- Additional Information
- Migration Guide
- Need Help?
🌐 TRUE Decentralized Model Storage
- Unlike Ollama/LMStudio that rely on centralized repositories (Hugging Face, GitHub), CryptoAgents uses IPFS/Filecoin for permanent, censorship-resistant model distribution
- Models are stored across a distributed network - no single point of failure or control
- Access your models even if traditional platforms go down or restrict access
🔒 Ultimate Privacy with Local Execution
- 100% local inference - your data never touches external servers (unlike cloud AI services)
- Zero telemetry - no usage tracking, no model access logs, no data collection
- Air-gapped capability - run models completely offline once downloaded
- 🏛️ Sovereign Weights: Maintain complete ownership and control over your AI models
- 🛡️ Zero Trust Privacy: Your prompts, responses, and model usage remain completely private
- 🔗 OpenAI Compatibility: Use familiar API endpoints with your existing tools
- 👁️ Multi-Model Support: Works with both text and vision models
- ⚡ Parallel Processing: Efficient model compression and upload
- 🔄 Automatic Retries: Robust error handling for network issues
- 📊 Metadata Management: Comprehensive model information tracking
In an era of increasing AI centralization, CryptoModels puts you back in control:
- Own Your Models: Models are stored on decentralized infrastructure, not controlled by any single entity
- Private by Design: All inference happens locally on your hardware - no external API calls, no data collection
- Censorship Resistant: Decentralized storage ensures your models remain accessible regardless of platform policies
- Vendor Independence: Break free from proprietary AI services and their limitations
- macOS or Linux operating system
- Sufficient RAM for your chosen model (see model specifications below)
- Stable internet connection for model uploads
bash mac.sh
Note: You'll need
llama.cpp.rb
in the same directory asmac.sh
bash ubuntu.sh
bash jetson.sh
- Activate the virtual environment:
source cryptomodels/bin/activate
Remember: Activate this environment each time you use the CryptoModels (
eai
) tools
- Verify your installation:
eai --version
CryptoModels uses a structured command hierarchy for better organization. All model operations are grouped under the model
subcommand:
# Model operations
eai model run --hash <hash> # Run a model server
eai model run <model-name> # Run a preserved model (e.g., qwen3-1.7b)
eai model stop # Stop the running model server
eai model status # Check which model is running
eai model download --hash <hash> # Download a model from IPFS
eai model preserve --folder-path <path> # Upload/preserve a model to IPFS
# General commands
eai --version # Show version information
# Run a preserved model (user-friendly)
eai model run qwen3-1.7b --port 8080
# Run any model by hash
eai model run --hash bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y --port 8080
# Check status
eai model status
# Stop the running model
eai model stop
# Download a model locally
eai model download --hash bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y
# Upload your own model
eai model preserve --folder-path ./my-model-folder --task chat --ram 8.5
We've prepared several models for you to test with. Each model is listed with its specifications and command to run.
Model | Size | RAM | Command |
---|---|---|---|
qwen3-embedding-0.6b | 649 MB | 1.16 GB | eai model run qwen3-embedding-0.6b |
qwen3-1.7b | 1.83 GB | 5.71 GB | eai model run qwen3-1.7b |
qwen3-4b | 4.28 GB | 9.5 GB | eai model run qwen3-4b |
qwen3-8b | 6.21 GB | 12 GB | eai model run qwen3-8b |
qwen3-14b | 15.7 GB | 19.5 GB | eai model run qwen3-14b |
qwen3-30b-a3b | 31 GB | 37.35 GB | eai model run qwen3-30b-a3b |
qwen3-32b | 34.8 GB | 45.3 GB | eai model run qwen3-32b |
Model | Size | RAM | Command |
---|---|---|---|
gemma-3-4b | 3.16 GB | 7.9 GB | eai model run gemma-3-4b |
gemma-3-12b | 8.07 GB | 21.46 GB | eai model run gemma-3-12b |
gemma-3-27b | 17.2 GB | 38.0 GB | eai model run gemma-3-27b |
Model | Size | RAM | Command |
---|---|---|---|
gemma-3n-e4b | 7.35 GB | 10.08 GB | eai model run gemma-3n-e4b |
The API follows the OpenAI-compatible format, making it easy to integrate with existing applications.
If you would rather interact with your models through a user-friendly desktop interface, download the CryptoAgents app from our Agent Store: eternalai.org/agent-store.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{"role": "user", "content": "Hello! Can you help me write a Python function?"}
],
"temperature": 0.7,
"max_tokens": 4096
}'
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image? Please describe it in detail."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/your-image.jpg"
}
}
]
}
],
"temperature": 0.7,
"max_tokens": 4096
}'
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"input": ["Hello, world!"]
}'
- A model in
gguf
format (compatible withllama.cpp
) - A Lighthouse account and API key
You can use eai model preserve
to upload your own gguf
models downloaded from Huggingface for deploying to the CryptoAgents platform.
The platform now supports multiple model types through the --task
parameter:
Use --task chat
for conversational AI and text generation models.
-
Download the model:
- Go to Huggingface and download your desired
.gguf
model - Example: Download
Qwen3-8B-Q8_0.gguf
- Go to Huggingface and download your desired
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
qwen3-8b-q8
) - Place the downloaded
.gguf
file inside this folder - Rename the file to match the folder name, but remove the
.gguf
extension
- Create a new folder with a descriptive name (e.g.,
Example Structure for Chat Models:
qwen3-8b-q8/ # Folder name
└── qwen3-8b-q8 # File name (no .gguf extension)
Use --task embed
for text embedding and similarity models.
-
Download the embedding model:
- Go to Huggingface and download your desired embedding model in
.gguf
format - Example: Text embedding models like
Qwen3 Embedding 0.6B
or specialized embedding models
- Go to Huggingface and download your desired embedding model in
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
qwen3-embedding-0.6b-q8
) - Place the downloaded
.gguf
file inside this folder - Rename the file to match the folder name, but remove the
.gguf
extension
- Create a new folder with a descriptive name (e.g.,
Example Structure for Embedding Models:
qwen3-embedding-0.6b-q8/ # Folder name
└── qwen3-embedding-0.6b-q8 # File name (no .gguf extension)
Use --task chat
for vision models as they are conversational models with image understanding capabilities.
-
Download the model files:
- Go to Huggingface and download both required files:
- The main model file (e.g.,
gemma-3-4b-it-q4_0.gguf
) - The projector file (e.g.,
mmproj-model-f16-4B.gguf
)
- The main model file (e.g.,
- Go to Huggingface and download both required files:
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
gemma-3-4b-it-q4
) - Place both downloaded files inside this folder
- Rename the files to match the folder name, but remove the
.gguf
extension - Add
-projector
suffix to the projector file
- Create a new folder with a descriptive name (e.g.,
Example Structure for Vision Models:
gemma-3-4b-it-q4/ # Folder name
├── gemma-3-4b-it-q4 # Main model file (no .gguf extension)
└── gemma-3-4b-it-q4-projector # Projector file (no .gguf extension)
Use the GGUF parser to estimate RAM usage:
npx @huggingface/gguf qwen3-8b-q8/qwen3-8b-q8 --context 32768
Basic Upload:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve --folder-path qwen3-8b-q8
Advanced Upload with Metadata:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve \
--folder-path qwen3-8b-q8 \
--task chat \
--ram 12 \
--hf-repo Qwen/Qwen3-8B-GGUF \
--hf-file Qwen3-8B-Q8_0.gguf \
--zip-chunk-size 512 \
--threads 16 \
--max-retries 5
Upload for Embedding Models:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve \
--folder-path qwen3-embedding-0.6b-q8 \
--task embed \
--ram 1.16 \
--hf-repo Qwen/Qwen3-Embedding-0.6B-GGUF \
--hf-file Qwen3-Embedding-0.6B-Q8_0.gguf
Option | Description | Default | Required |
---|---|---|---|
--folder-path |
Folder containing the model files | - | ✅ |
--task |
Task type: chat for text generation models, embed for embedding models |
chat |
❌ |
--ram |
RAM usage in GB at 32768 context length | - | ❌ |
--hf-repo |
Hugging Face repository (e.g., Qwen/Qwen3-8B-GGUF ) |
- | ❌ |
--hf-file |
Original Hugging Face filename | - | ❌ |
--zip-chunk-size |
Compression chunk size in MB | 512 | ❌ |
--threads |
Number of compression threads | 16 | ❌ |
--max-retries |
Maximum upload retry attempts | 5 | ❌ |
The upload process involves several steps:
- Compression: The model folder is compressed using
tar
andpigz
for optimal compression - Chunking: Large files are split into chunks (default: 512MB) for reliable uploads
- Parallel Upload: Multiple chunks are uploaded simultaneously for faster transfer
- Retry Logic: Failed uploads are automatically retried up to 20 times
- Metadata Generation: A metadata file is created with upload information and model details
- IPFS Storage: All files are stored on IPFS via Lighthouse.storage
Common Issues:
- Missing API Key: Ensure
LIGHTHOUSE_API_KEY
is set in your environment - Network Issues: The system will automatically retry failed uploads
- Insufficient RAM: Check the model's RAM requirements before uploading
- Invalid File Format: Ensure the model is in GGUF format
- We support GGUF files compatible with llama.cpp
- Convert other formats using llama.cpp conversion tools
- Choose quantization levels (Q4, Q6, Q8) based on your hardware capabilities
- Higher quantization (Q8) offers better quality but requires more resources
- Lower quantization (Q4) is more efficient but may affect model performance
- Monitor system resources during model operation
- Use appropriate context lengths for your use case
- All models are stored on IPFS with content addressing
- This ensures model integrity and secure distribution
- API keys are stored securely in environment variables
- Models are verified before deployment
-
Model Selection
- Choose models based on your hardware capabilities
- Consider quantization levels for optimal performance
- Test models locally before deployment
-
Resource Management
- Monitor RAM usage during model operation
- Adjust context length based on available memory
- Use appropriate batch sizes for your use case
-
API Usage
- Implement proper error handling
- Use appropriate timeouts for requests
- Cache responses when possible
- Monitor API usage and performance
-
Deployment
- Test models thoroughly before production use
- Keep track of model versions and CIDs
- Document model configurations and requirements
- Regular backups of model metadata
- Visit our website: eternalai.org
- Join our community: Discord
- Check our documentation for detailed guides and tutorials
- Report issues on our GitHub repository
- Contact support for enterprise assistance