Optimize for NEAR Storage On-Chain (Reducing Contract Size)

Note: We leave some “questions” for you to experiment yourselves what the end results are.

Storage on chain is expensive. For every 100kB used, 1 NEAR is locked. These NEAR can be returned if you remove the storage used. There are two main causes of storage usage:

Transactions: Each transaction is a receipt, which is saved on-chain, hence taking up some storage. Each transaction receipt have different length (some calls more functions, hence longer receipt; others call only 1 function hence shorter receipt), hence the amount of storage used is different. Just like you need more paper to print a long receipt, you use up more storage space for storing longer receipt.
Smart contract deployment: When you deploy a smart contract, it will take up the size of the smart contract (it may differ slightly from what you see from the local wasm file, but overall the differences are small).
For transactions, unless you reduce the transactions you did, you can’t really reduce how much space is taken due to transactions.

(Question: Can we delete our top-level account via near-cli? If we delete, does it freed up storage cost? (p.s. check explorer after deleting a sub-account.) Certainly, the transactions won’t be deleted, hence will it still lock the storage cost, hence can never be released?)

Here, we shall talk about optimizing your smart contract size.

Note: it is said that development in AssemblyScript leads to smaller wasm size, for the same code written, compared to Rust. One isn’t an AS developer (prefer to use Rust), so one won’t speak about that; in fact, one don’t need to speak about that, it’s the compiler’s job anyways.

Optimize Contract Size

Persistent Storage

Near have a list of collections, the list is here. These collections cache data to reduce Gas fee. The tradeoff is: how much Gas fee you want to reduce? How much storage cost you’re willing to pay for the Gas fee reduction? How many unique persistent storage do you need?

The difference is mentioned in the link above (the exact words are as below):

It’s important to keep in mind that when using std::collections (i.e. HashMap, etc Rust locals), that each time state is loaded, all entries in the data structure will be read eagarly from storage and deserialize. This will come at a large cost for any non-trivial amount of data, so to minimize the amount of gas used the SDK collections should be used in most cases.

Consider this example: We have a main contract that have a dictionary/hash/map to link an ArticleId to an Article. The Article is an object (in Rust, it’s also called a Struct) with its attributes.

use near_sdk::AccountId;
use near_sdk::collections::LookupMap;
use std::collections::HashMap;

pub struct Contract {
pub article_by_id: LookupMap<ArticleId, Article>,
}

pub struct Article {
pub owner_id: AccountId,
pub article_id: String,
pub royalty: HashMap<Account, u16>,
}
We see that Contract.article_by_id uses LookupMap and Article.royalty uses HashMap. We shall discuss why we don’t use other types.

Consider article_by_id, when we create an article, the id will be unique and specific to that article. It will be stored forever. A LookupMap cache the results in memory so we don’t have to “calculate” (using Gas fee) every time we need the results.

As mentioned before, since everything will be deserialize if read when using HashMap, and LookupMap<AccountId, Article> is not trivial amount of data (when there are lots of Article being created), it should be cache on-chain.

Now, why are we using LookupMap instead of UnorderedMap? The latter offers iterating over the collection functionality, which we don’t need. If you need it, use the latter.

Then for royalty, we’re using HashMap. Thing is, we have lots of Article, each with it’s own unique Article.royalty. For each unique Article.royalty, we need to create a new key to save the storage.

P.S. If you haven’t know yet, you need a unique key for each object from NEAR SDK collections. If two NEAR SDK collections share the same key, they share the same data (irregardless of whether it’ll work or fail if you share memory between two different objects, like Vector and LookupMap)

Let’s illustrate the same key shared storage scenario. Say we create two article, Article A and Article B. These are their Article.royalty equivalent.

// Values are percentage, 100.00% == 10_000.

// Article A
{
"alice.near": 1_000,
"bob.near": 500,
}

// Article B
{
"alice.near": 1_500,
"charlie.near": 2_000,
}
For some reason, there’s a repeat royalty for alice.near. If you use the same storage key, it will lead to error, complaining that: you already have an alice.near stored in Article A, and you are repeatingly storage another value to that key, and this is not possible. We also want them to be independent of each other; and same storage key used by 2 different HashMap means they share the same values.

(Question: can you fetch the values stored in Article A if you initialize with the same storage key in ArticleB?)

The solution is to create separate storage key for each collections. However, if we have 1 million articles, we need 1 million different collection keys to store them separately. This sounds stupid! Hence, it makes sense to store them rather as HashMap instead. Furthermore, they’re trivial. These royalty are originally designed to limit how much data they can store so fetching of data is small and deserializing them is cheap. This strengthen our choice to use HashMap than the equivalent SDK collections, despite (slightly) higher Gas usage (which is negligible since the collection is so small it’s negligible).

In conclusion, when designing your smart contract, choose whether to use NEAR SDK collections or Rust collections based on triviality and how many repeats you need for the same Map.

Code reduction

The first code we write, they’re bad. In fact, they’re just a draft. We need some refactoring to delete unnecessary code. There’s a tradeoffs between easier-understanding code and storage optimization.

For example, perhaps one have a PayoutObject, and it’s only used in a single function.

use near_sdk::json_types::U128;
use near_sdk::AccountId;

pub struct Payout {
pub payout: HashMap<ArticleId, U128>,
}

impl Default for Payout {
fn default() -> Self {
Self [
payout: HashMap::new(),
]
}
}

Why can’t we just define a

HashMap::new()

in the specific function that uses this? Of course, if you do the latter, it would be more difficult to understand the code. The former makes things easier to understand, from an Object Oriented perspective. However, (significantly) more code leads to more storage used after compilation to WASM. So, time to do some trade-offs.

In one’s opinion, readability is more important than storage space optimization. If required, clone the original readable component and do some optimization each time you make changes, so people can understand what you’re doing by reading your original code. Of course, this means more work for you.

(Question: How much space is saved if you replace the former with the latter? If you have a similar scenario in your program, try optimizing it by writing less code and see how much space it compiled to, is there a differences? (Sometimes there isn’t, sometimes there are. For those that doesn’t, prefer to keep the readable code for easier debugging in the future. ))

Wasm-Opt

After you compiled the optimized release version, you can still further reduce the contract size by using wasm-opt. To install, download the binary here for your OS and unzip it. Inside, there’s a “bin” folder, which you should copy the exact path to that folder and add it to your Environment path. After which, try to call wasm-opt from the command line/terminal whether it runs or not. If not, google online how to solve it (perhaps you didn’t add it to the correct environment variable, perhaps your terminal is already open which it doesn’t refresh the latest path, are the two most common problems).

Running these would reduce the file size:

#!/bin/bash
set -e

export WASM_NAME=tipping.wasm
z
mkdir res

RUSTFLAGS='-C link-arg=-s' cargo build --target wasm32-unknown-unknown --release
cp target/wasm32-unknown-unknown/release/$WASM_NAME ./res/
wasm-opt -Os -o res/output_s.wasm res/$WASM_NAME
ls res -lh

Here, we assume that the original contract is compiled to tipping.wasm, (Cargo.toml name is tipping). Then, the optimized name is output_s.wasm. We then run ls (on Linux) to check for their difference in file size. It should be smaller.

Note: you can also use -Oz for the flags, but one found that unnecessary, as for the project one works on, it doesn’t lead to smaller file size.

Important Note: The RUSTFLAGS should be “link-arg=-s”, if you accidentally change it to “-z”, you might have big problem. At least for one, it generates a far bigger wasm file. You shall experiment with it and check for your own project.

Perhaps in the future, they might allow .wasm.gz file so you can further optimize file size. Currently, one tried it and it cannot deserialize a gzipped file, only supporting .wasm file on-chain.

Cargo.toml
These are the usual flags for cargo toml.

[profile.release]
codegen-units = 1
opt-level = "s"
lto = true
debug = false
panic = "abort"
overflow-checks = true

You could choose opt-level = “z” too, it might or might not generate a smaller binary.

Some other small wins

Avoid String Formatting
format! and to_string() can bring code bloat; so use static string (&str) whenever possible.

Removing rlib if not required
If you don’t need to do simulation testing, remove rlib.

Use Borsh serialization
Prefer to not use serde when possible. Here is a page on how to override the serialization protocol.

Avoid Rust standard assertions and panic macro
These contains information on the the error returned, which introduces unnecessary bloat. Instead, try these methods:

// Instead of this
assert_eq!(
contract_owner,
predecessor_account,
"ERR_NOT_OWNER"
);

// Try this (for all versions of near-sdk-rs)
if contract_owner != predecessor_account {
env::panic_str("ERR_NOT_OWNER")
}

// For near-sdk-rs v4.0 pre-release
use near_sdk::require;
require!(
contract_owner == predecessor_account,
"ERR_NOT_OWNER"
);

require! is a lightweight macro introduced in near-sdk-rs v4 (and it’s also one’s favorite macro) to replace assert! macro. It works like assert! mostly, except for one small difference: it cannot specify format! per se.

// assert! can specify format
assert!(
some_conditions,
"{}: {}"
error_type,
error_message
);

// require! cannot
require!(
some_conditions,
format!( // you need format! yourself
"{}: {}",
error_type,
error_message
)
);

And as we mentioned to avoid string formatting before, it’s best if you hard-code the message. Of course, if your really need it, just sacrifice some bytes to use format! is ok: it take up only negligible space if you don’t use it extensively.

Don’t use .expect()
Instead, use unwrap_or_else. One wrote the helper function in the near helper crate which you might wanna check out.

Otherwise, you could always put this in internal.rs:

fn expect_lightweight(option: Option, message: &str) -> T {
option.unwrap_or_else(||
env::panic_str(message)
)
}

// instead of:
let owner_id = self.owner_by_id
.get(&token_id)
.expect("Token not found");

// use this:
let owner_id = expect_lightweight(
self.owner_by_id.get(&token_id),
"Token not found"
);

Avoid Panicking

These are some common errors to panics:

Indexing a slice out of bounds. my_slice[i]
Division of zero: dividend / 0
unwrap(): prefer to use unwrap_or or unwrap_or_else or other safer methods to not panic. In near-sdk, there’s also env::panic_str (env::panic is deprecated) to panic, and they mentioned here that it could be preferred. However, you could also use the old-fashioned match to deal with stuffs and see if it works better than panic_str; if not, then use panic_str for easier code understanding. Otherwise, you could switch to match if it worth it.

Try to implement workaround so it returns None or enforce no panicking while developing contract.

Lower level approaches

Check out the link in the reference for other ways to reduce contract size not mentioned here. (The stuffs mentioned here are also mostly not mentioned here).

Do bear in mind, this example is not updated anymore, so it requires you to derive to the latest update manually.

The list is here:

Tiny Contract(deprecated)
Contract for fuzzing rs (you can view the master branch, this is a fixed branch to prevent it removed in the future). One don’t know what this contract does, nor what “fuzzing” means; you would require understanding yourself.
Eugene’s example for fast fungible token, and you can watch the youtube video here. He implement it without using near-sdk. More unhappy programming experience, but optimize for size.
Aurora uses rjson as a lightweight JSON serialization crate. It has a smaller footprint than serde currently packaged with Rust SDK. See this example and reader requires deriving how it’s use themselves. Anoter to consider is miniserde crate, example here

The Wasm-snip tool

It might be useful to replace functions not used with unreachable instruction. Usually you don’t need to do this: only if you really really need to save that space you could go forward.

They mentioned the tool is useful for removing the panicking infrastructure too!

You could also run wasm-opt with –dce flag after snipping, so those snipped functions get removed.

Conclusion

There are lots of ways to optimize a contract. Some optimization are done easily without any changes, others have compromises and trade-offs that you shall decide whether or not it’s worth the trade-offs. In general, unless your contract is utterly big, which usually result from too many lines of code being written and which you are encouraged to check for necessity of code written; otherwise simple usage like wasm-opt and persistent storage choice should be sufficient.