Model Weights File Formats in Machine Learning

As Machine Learning and AI technologies continue to advance, the need for efficient and secure methods to store, share, and deploy trained models becomes increasingly critical. Model weights file formats play a vital role in this process. These formats preserve a model’s learned parameters, enable reproducibility, and facilitate deployment across

As Machine Learning and AI technologies continue to advance, the need for efficient and secure methods to store, share, and deploy trained models becomes increasingly critical. Model weights file formats play a vital role in this process. These formats preserve a model’s learned parameters, enable reproducibility, and facilitate deployment across diverse environments and platforms. In this comprehensive blog post, we delve into the most popular model weight file formats in the industry, examining their origins, structures, use cases, and strengths.

  1. Why Model Weights File Formats Matter
  2. Overview of Popular Model Weights File Formats
  3. The Role of Pickle in Model Weights File Formats
    1. What is Pickle?
    2. Where is Pickle Used
    3. Pros
    4. Cons
  4. The Role of Protocol Buffers in Model Weights File Formats
    1. Where Protocol Buffers are Used
  5. Detailed Exploration of Model Weights File Formats
    1. .pt / .pth (PyTorch)
    2. .ckpt (TensorFlow)
    3. .h5 (HDF5)
    4. .onnx (Open Neural Network Exchange)
    5. .safetensors (Safetensors)
    6. .gguf (GGML Unified Format)
    7. .tflite (TensorFlow Lite)
    8. .engine (TensorRT)
    9. .mlmodel (Core ML)
  6. Choosing the Right Format
  7. Conclusion

Why Model Weights File Formats Matter

Model weight formats are more than just data containers. They:

  • Enable model portability and interoperability across tools and frameworks.
  • Preserve training progress for checkpointing and resumption.
  • Support deployment in resource-constrained environments.
  • Ensure security and integrity when sharing models.

Each format is designed with specific goals and trade-offs in mind, influencing how and where it is used.

Below is a summarized comparison of widely used model weight formats:

FormatFrameworksPrimary Use CaseKey Features
.pt / .pthPyTorchTraining and inferenceFlexible, human-readable, framework-native
.ckptTensorFlowCheckpointing and training resumptionRobust, efficient for large models
.h5Keras / TensorFlowSaving full models in one fileIncludes model, weights, and optimizer state
.onnxCross-platformModel interoperability and deploymentOpen standard, hardware-optimized inference
.safetensorsHugging Face (PyTorch)Secure and fast model sharingApple-native, optimized for Apple Silicon
.ggufGGML-based (LLaMA.cpp)Efficient LLM inferenceQuantization-ready, CPU/GPU optimized
.tfliteTensorFlow LiteMobile and edge inferenceLightweight, hardware-accelerated
.engine
TensorRTGPU Inference OptimizationHigh-performance, precision-tunable deployment
.mlmodelCore ML (Apple)iOS/macOS deploymentApple-native, optimized for Apple Silicon

The Role of Pickle in Model Weights File Formats

Python’s pickle module is foundational in many machine learning file formats, particularly within the PyTorch and broader Python ecosystem. Though not a formal model format itself, pickle is often the underlying mechanism used to serialize and deserialize model weights and configurations.

What is Pickle?

Pickle is a standard Python library that converts Python objects into a byte stream (serialization) and restores them (deserialization). It allows objects, such as model weights, entire models, or training histories, to be saved to disk.

A diagram illustrating how Python's pickle module serializes objects into byte streams, which can be stored in files, databases, or memory, and later deserialized back into objects. Arrows clearly show the flow between serialization and deserialization phases.
Fig 2. Pickle Serialization and Deserialization Process in Python

Where is Pickle Used?

  • PyTorch: The .pt and .pth formats use pickle to serialize model state dictionaries or entire models.
  • Scikit-learn: Models are often saved using .pkl, which is a direct result of using pickle.dump().
  • XGBoost & LightGBM: While they support native formats, pickle is sometimes used for quick saving in Python environments.

Pros:

  • Native to Python: Easy to use, especially for Python developers.
  • Flexible: Can store virtually any Python object, including complex models.

Cons:

  • Security Risk: Loading a pickled file can execute arbitrary code if the file is from an untrusted source. This makes it unsuitable for public sharing or web-facing applications.
  • Lack of Interoperability: Pickled files cannot be easily loaded outside Python or across different framework versions.

The Role of Protocol Buffers in Model Weights File Formats

Developed by Google, Protocol Buffers (protobuf) is a foundational encoding format, not typically a file extension used directly by developers to save or load model weights. It offers a compact and efficient binary serialization mechanism, making it ideal for representing structured data like model architectures and weights.

A step-by-step diagram showing how Protocol Buffers work, from creating a .proto file, compiling it using the protoc compiler, integrating it into project code, and using the generated classes to serialize and deserialize data for machine learning models.
Fig 3. Protocol Buffers Workflow for Model Serialization[Source]

Where Protocol Buffers are Used:

  • TensorFlow .pb files: These are frozen models serialized using Protobuf, combining both architecture and trained parameters for deployment.
  • ONNX (.onnx): The entire ONNX standard is built upon the Protobuf serialization schema.
  • Apple Core ML (.mlmodel): Uses Protobuf under a custom schema to define model components.

Detailed Exploration of Model Weights File Formats

.pt / .pth (PyTorch)

PyTorch’s native model weight formats are among the most widely used in research and industry today. Introduced by Facebook AI in 2016, .pt and .pth formats store either the entire model or just its state dictionary (weights and biases). They rely on Python’s pickle module for serialization.

A minimalistic graphic showing .pt and .pth file extensions in bold black text, accompanied by the PyTorch flame icon in orange, symbolizing their role as native model weights file formats for PyTorch models.
Fig 4. PyTorch’s .pt and .pth Model Formats

Key Characteristics:

  • Highly flexible, making it easy to save and load models for research workflows.
  • Commonly used with Hugging Face models, especially for transformer-based architectures.
  • .pth and .pt are functionally identical; naming is a matter of convention.

Considerations:

  • While widely supported within PyTorch, these formats are not natively portable to other frameworks.
  • Pickle-based loading poses potential security risks when files come from untrusted sources.

.ckpt (TensorFlow)

The checkpoint format in TensorFlow, denoted by .ckpt, allows models to save training states, weights, and optimizer configurations. Each checkpoint typically includes three files: .data, .index, and .meta (for legacy models).

A black-and-white icon of a document labeled .ckpt (TensorFlow), symbolizing TensorFlow’s checkpoint file format used to save training weights, states, and configurations during model development.
Fig 5. TensorFlow Checkpoint Format represented as a .ckpt file

Key Characteristics:

  • Enables resuming training exactly where it left off.
  • Ideal for training large models and storing intermediate results.
  • Widely used in models like BERT and T5 from Google.

Considerations:

  • Checkpoints are not well-suited for deployment; models are typically converted to TensorFlow’s SavedModel format or to .tflite.
  • File management can be more complex due to the multiple-part structure.
  • Does not contain the model architecture; it requires the original code to rebuild the model structure.

.h5 (HDF5)

Adopted early by Keras, the .h5 (HDF5) format allows the storage of the entire model architecture, weights, and optimizer state in a single file. This format is intuitive for sharing and inspecting model components.

A square orange icon displaying .h5 (HDF5) in bold black text, representing the HDF5 file format used in Keras and TensorFlow to store full model architecture, weights, and optimizer state in a single file.
Fig 6. Keras-style .h5 Model file icon for HDF5 Format

Key Characteristics:

  • User-friendly and well-documented.
  • Suitable for small to medium-sized models.
  • Offers a clear structure to inspect and modify weights or model layers.

Considerations:

  • Increasingly replaced by TensorFlow’s SavedModel format.
  • Not as efficient for very large models or complex serialization needs.

.onnx (Open Neural Network Exchange)

ONNX is an open-source format jointly developed by Microsoft and Facebook. It allows models trained in one framework (like PyTorch) to be run in another (like TensorFlow or ONNX Runtime), promoting cross-platform compatibility.

A diagram showing how ONNX serves as a central bridge between machine learning frameworks like PyTorch, TensorFlow, Keras, and others, allowing models to be exported to and loaded from the ONNX format to achieve seamless interoperability across platforms.
Fig 7. ONNX Format enabling Cross-framework Model Interoperability[Source]

Key Characteristics:

  • Encodes model architecture and weights in a Protobuf (protocol buffer) format.
  • Widely supported across frameworks and hardware accelerators.
  • Popular for deploying models in production environments.

Considerations:

  • Custom layers and operations may require ONNX-compatible rewrites or extensions.
  • Conversion tools (e.g., torch.onnx.export) can have limitations or require manual adjustments.

.safetensors

Created by Hugging Face, .safetensors addresses the security risks of pickle-based formats like .pt. It is a binary format optimized for safe and fast tensor storage, especially useful when models are shared publicly.

A banner-style image from Hugging Face showing the safetensors repository, promoting it as a secure and fast model weights file format for tensor storage. Includes GitHub stats like contributors, stars, and forks, with a hugging emoji symbolizing safe sharing.
Fig 8. Hugging Face’s Safetensors: Secure and Efficient Tensor Storage[Source]

Key Characteristics:

  • Enables zero-copy loading, which significantly reduces loading time.
  • Eliminates code execution risk during deserialization.
  • Became the default for Hugging Face’s Transformers library.

Considerations:

  • Stores only tensors; model architecture must be defined separately in code.
  • Less flexible for full model serialization, but ideal for safe inference sharing.

.gguf (GGML Unified Format)

Designed for efficient inference of large language models (LLMs), .gguf is part of the llama.cpp ecosystem. It integrates weights, tokenizer, and metadata into a single quantization-friendly file.

A technical breakdown of the .gguf (GGML Unified Format) file layout, showing how it stores tensor metadata, offsets, version info, and key-value pairs for efficient quantized inference in large language models like LLaMA.
Fig 9. Internal Structure of a .gguf file for LLM Inference[Source]

Key Characteristics:

  • Supports multiple levels of quantization (Q4, Q5, etc.), reducing memory and compute requirements.
  • Ideal for deployment on CPU/GPU in resource-limited environments.
  • Widely used in the open-source LLM community (e.g., Mistral, LLaMA).

Considerations:

  • Tailored specifically for llama.cpp and similar inference engines.
  • Not suited for training or use outside GGML-compatible tools.

.tflite (TensorFlow Lite)

.tflite is a FlatBuffer format developed by Google to optimize TensorFlow models for edge and mobile devices. It provides a highly compressed model structure ideal for on-device inference.

A vibrant yellow graphic illustrating the .tflite (TensorFlow Lite) format, with an icon and brief description emphasizing its use in optimizing TensorFlow models for efficient edge and mobile device inference.
Fig 10. TensorFlow Lite .tflite File Format for Mobile Model Deployment

Key Characteristics:

  • Offers low-latency execution with optional hardware acceleration (e.g., NNAPI, GPU, Edge TPU).
  • Suitable for Android and embedded systems.
  • Includes support for post-training quantization.

Considerations:

  • Not all TensorFlow operations are supported; they may require model simplification or conversion.
  • Debugging and inspection are more difficult due to the compact binary structure.

.engine (TensorRT)

TensorRT is NVIDIA’s high-performance Deep Learning Inference library, and .engine is its optimized runtime format. Rather than being used for training, models are typically exported from formats like .onnx or .pb and then compiled into .engine files for ultra-fast inference on NVIDIA GPUs.

A visual diagram showing how models from frameworks like PyTorch, TensorFlow, and MXNet are converted through NVIDIA TensorRT into .engine files, optimized for inference across GPU hardware like Tesla T4, Jetson TX2, and Ampere A100.
Fig 11. TensorRT .engine workflow for GPU-optimized Inference[Source]

Key Characteristics:

  • Converts pretrained models into a highly optimized execution graph.
  • Supports precision modes like FP32, FP16, and INT8.
  • Ideal for production deployment on NVIDIA hardware.

Common Use Cases:

  • Real-time object detection on Jetson devices.
  • High-throughput AI inference in cloud environments with GPUs.
  • Robotics, automotive AI, and other latency-sensitive tasks.

Considerations:

  • .engine files are hardware and version specific.
  • Requires regeneration if used on a different GPU architecture or TensorRT version.
  • Best used post-training for deployment only.

.mlmodel (Core ML)

Apple’s .mlmodel format is designed for deploying ML models within Apple’s ecosystem, including iOS, macOS, and watchOS. It allows seamless integration into apps using Xcode and Swift.

A banner from Apple’s Core ML tools GitHub page, highlighting coremltools for converting and managing .mlmodel files used to deploy machine learning models in Apple apps using Xcode and Swift.
Fig 12. Apple Core ML .mlmodel format for iOS Ecosystem Deployment[Source]

Key Characteristics:

  • Stores model specifications, inputs/outputs, and metadata in a protobuf structure.
  • Optimized for Apple Silicon performance.
  • Compatible with Core ML tools for model conversion and validation.

Considerations:

  • Restricted to Apple platforms.
  • Requires conversion tools (e.g., coremltools) for model preparation.

Choosing the Right Format

Your choice of model weight file format should depend on:

  • Framework and training pipeline
  • Target deployment platform (mobile, web, server)
  • Security and portability requirements
  • Performance constraints (e.g., quantization, RAM limits)
ScenarioRecommended Format
Training with PyTorch.pt / .safetensors
TensorFlow model checkpointing.ckpt
Model sharing in Keras.h5
Cross-framework deployment.onnx
Edge inference (Android).tflite
Secure sharing via Hugging Face.safetensors
Quantized LLMs on CPU/GPU.gguf
GPU-accelerated deployment.engine
iOS/macOS application.mlmodel

Conclusion

Understanding model weight file formats is essential for efficient model training, sharing, and deployment. Each format is optimized for specific use cases and environments. Whether you’re deploying a quantized LLM on a laptop, integrating AI into a mobile app, or sharing research models securely, the right format ensures performance, portability, and security.

Choose wisely, and your models will be as efficient in deployment as they are intelligent in inference.



Read Next

VideoRAG: Redefining Long-Context Video Comprehension

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

AI Agent in Action: Automating Desktop Tasks with VLMs

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from…

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

🎃 Halloween Sale: Exclusive Offer – 30% Off on All Courses.
D
H
M
S
Expired
 

Get Started with OpenCV

Subscribe To Receive

We hate SPAM and promise to keep your email address safe.​