Three tiers of local LLM hardware: turnkey boxes, retail GPUs and grey-market cards, imported into South Africa

Importing AI and Local-LLM Hardware into South Africa

Infographic showing the three tiers of local LLM hardware imported into South Africa: turnkey AI boxes like DGX Spark and Strix Halo, retail GPUs like the RTX 4090, and grey-market parts like modded 48GB cards and SXM2 adapter boards

Importing AI hardware into South Africa means getting the GPUs, mini-supercomputers and high-VRAM cards that local LLMs run on past two obstacles: thin local supply and a customs process that treats high-value electronics with suspicion. This guide covers what people are actually running local models on in 2026, the grey-market parts worth knowing about, and how Scott’s Shipping Services brings it in as one all-inclusive price.


The quick version

Local LLM hardware is the compute you need to run open-weight models like DeepSeek, Qwen and Llama on your own machine instead of a cloud API. The deciding spec is memory: how much VRAM or unified memory you can put in front of the model.

The hardware splits into three tiers. Turnkey boxes like the NVIDIA DGX Spark and AMD Strix Halo mini PCs ship with 128GB of unified memory. Retail GPUs like the RTX 4090 and 5090 remain the single-card workhorses. Grey-market gear covers China-modded 48GB RTX 4090s and ex-datacentre cards mounted on adapter boards.

Almost none of it sits on a South African shelf, and the modded parts carry real risk. SSS sources the hardware, handles the customs classification, duties and VAT, and delivers it landed as one quote.

Bar chart comparing VRAM sizes for local LLM hardware: 24GB, 48GB, 128GB and 424GB

What is driving the demand

Two things turned self-hosting from a niche hobby into a queue of buyers. The first is the open Chinese models. DeepSeek and Qwen now ship open-weight releases that hold their own against commercial APIs, and at 4-bit quantisation a capable model like DeepSeek-R1 32B fits on a single 24GB card. Good models that run on hardware you can own are the whole driver.

The second is Odysseus, the self-hosted AI workspace released by Felix Kjellberg (PewDiePie) on 31 May 2026. It runs local backends like Ollama, vLLM and llama.cpp, and his own rig of eight modded RTX 4090s plus two RTX 4000 Ada cards put a real number on what a serious local setup looks like. The result is a wave of people pricing their own builds.


What people actually run local LLMs on

Turnkey boxes: DGX Spark and Strix Halo

The NVIDIA DGX Spark is built around the Grace Blackwell GB10 chip with 128GB of unified memory, enough to load models up to roughly 200 billion parameters in NVFP4. It runs them at around 35 to 80 tokens per second, with memory bandwidth near 273 GB/s as the main limit. Even abroad it has been supply-constrained, with multi-week lead times.

The AMD Strix Halo (Ryzen AI Max+ 395) pairs 16 Zen 5 cores with an integrated Radeon 8060S and up to 128GB of shared memory, leaving roughly 115 to 120GB usable for inference. It loads a 70B model without splitting it across cards and pushes a 30B model at around 100 tokens per second. It has become the practical all-rounder for a quiet desktop or air-gapped build.

Retail GPUs: RTX 4090 and 5090

The RTX 4090 with 24GB remains the best single-card option, handling 32B models at Q4 with room for cache. The RTX 5090 at 32GB lifts that ceiling, and two cards together open up larger models. The constraint in South Africa is not whether they work, it is getting current stock at a sane price.

Decision-flow infographic matching model size to hardware: up to 32B on a 24GB card, up to 70B on 128GB unified memory, bigger models on a multi-GPU build

The grey market: modded cards and adapter boards

This is where the buzz gets loud and the risk gets real.

Infographic on grey-market modded GPUs showing they run hot, are loud, and come with no warranty

Modded 48GB RTX 4090s. NVIDIA never made a 48GB 4090. These are custom builds out of China that reball the memory to double the VRAM, sold mostly on Alibaba and AliExpress as blower-style cards. On paper a 48GB card for large-model work is tempting. In practice, independent reviewers have measured VRAM temperatures above 105°C, blower noise past 60 dB, and stress-test failure rates several times higher than stock cards, often built on second-hand chips with no warranty.

SXM2 and SXM4 adapter boards. These breakout boards drop ex-datacentre modules like the Tesla V100, P100 and A100 into a normal PCIe slot. The VRAM-per-rand maths looks great, but power delivery, cooling and BIOS quirks make these a project build, not a plug-and-play card.

The grey market is exactly where buying blind costs money. A card that arrives dead, or cooks itself in a fortnight, has no recourse when you ordered it solo off a marketplace listing. This is the part of the market where knowing which sellers and which parts are worth touching is the whole value.


Memory and storage worth importing too

Compute is only half a build. Running a model larger than your VRAM means offloading layers to system memory, so DDR5 capacity and speed matter for anyone pushing past their card. Model weights also have to live somewhere fast: a 70B model at Q4 is around 40GB, and the largest open models run into hundreds of gigabytes, so a quick NVMe SSD earns its place.

It is usually worth importing the RAM and storage on the same shipment as the compute. One consignment, one customs entry, one delivery.


Why this hardware is hard to get in South Africa

Local stock is thin past a retail 4090. Turnkey AI boxes and datacentre-class cards rarely reach South African shelves at all, and when they do the markup reflects the scarcity.

High-value electronics also draw SARS attention. Correct tariff classification, an honest customs value, duties where they apply and VAT at 15% are all part of the entry. Get the paperwork wrong and a costly parcel sits in limbo while it is queried. For how that valuation is worked out, see our guide on how customs value is determined in South Africa.

Buying grey-market parts yourself stacks a second risk on top: if the card is faulty on arrival, you are arguing with an overseas marketplace seller, not a local supplier.


How SSS gets it in for you

Scott’s Shipping Services is an end-to-end import service. We buy the hardware, ship it, clear it through customs, pay the duties and VAT, and deliver it to your door as one all-inclusive quote. We do not do clearing-only work or handle goods you have already bought.

Sourcing judgement. We work with reputable suppliers, and where a build calls for grey-market parts we know which sellers and which cards are worth the risk and which to avoid.

Clean customs entries. High-value electronics get the correct HS classification and an honest customs value before they ship, so clearance stays fast and defensible if SARS asks questions.

One point of contact. From the seller’s warehouse to your door, one team handles the whole chain. If you want us to buy specific items on your behalf, that is our international shopping concierge; for full builds and heavier consignments, our freight import service covers it. New to the process? Start with how to import goods to South Africa.


Frequently asked questions

Can you import a DGX Spark or Strix Halo machine to South Africa?

Yes. We source turnkey AI machines like the NVIDIA DGX Spark and AMD Strix Halo mini PCs from overseas suppliers and handle the full import, including customs, VAT and delivery. Lead times depend on stock abroad, which has been tight, so it is worth asking early.

Are the China-modded 48GB RTX 4090 cards worth it?

Sometimes, with eyes open. They offer a lot of VRAM, but they are custom builds with reported thermal and reliability problems and no warranty. If a build genuinely needs that much VRAM, we can advise on whether a modded card or a different route makes more sense, and source accordingly.

Can you get ex-server GPUs like the V100 or A100 and the adapter boards?

Yes. We can source datacentre cards and the SXM-to-PCIe adapter boards they need. These are involved builds, so we will be straight with you about the power and cooling they demand before you commit.

Do I pay duty and VAT on AI hardware?

VAT at 15% applies to imported hardware. Whether duty applies, and at what rate, depends on the exact tariff classification of the item. We confirm the classification and the full landed cost before you commit, so there are no surprises at clearance.

What does it cost to import a local LLM rig?

It depends on the parts, the source country and the shipment weight. We quote the whole job as one figure, with the hardware, shipping, customs, duties, VAT and delivery included. Get a quick estimate to see your landed cost.

Can SSS advise on what hardware to buy?

We are importers, not a build shop, but we know the market well. Tell us the models you want to run or the rig you are copying, and we will help you land the right parts reliably.


Useful resources

SSS: Why run a local LLM? The case for owning your AI hardware

SSS: How to import goods to South Africa

SSS: How customs value is determined in South Africa

SSS: 5 common importing mistakes and hidden costs


Planning an AI or local-LLM build? Use our online calculator for a quick estimate, or get in touch to source the parts.


About the Author

With years of hands-on experience in international shipping and South African customs, Scott started SSS to give individuals and businesses a simpler, more transparent way to import. He and his team have handled thousands of shipments from six continents, building a reputation for reliability, compliance, and honest pricing.