How we collect and process construction-material prices

Vật Giá Top publishes construction-material (VLXD) prices for all 34 provinces of Vietnam. This page describes the full pipeline we use to get those numbers — from the official price bulletin a provincial Department of Construction releases, all the way to the number you see on screen. The goal is for anyone to be able to verify every figure we publish.

One-sentence summary

A daily crawler downloads PDF / Excel price bulletins from each provincial Department of Construction → a rules-based classifier maps every line into one of 15 material groups → every price is converted into the canonical unit per TCVN standards → lines that can't be safely converted are dropped → results are published alongside the original bulletin period so users can cross-check against the source.

1. Data sources

Every price on Vật Giá Top is aggregated from the official VLXD price bulletins of the 34 provincial Departments of Construction. These are public records under Ministry of Construction Circular 11/2021/TT-BXD and subsequent amendments, published by each provincial department either quarterly or monthly.

Three concrete source layers:

  • Provincial Department of Construction portals: each province has a published page (e.g. soxaydung.hanoi.gov.vn, sxd.hochiminhcity.gov.vn). Most release a PDF or Excel file with a digitally signed bulletin.
  • Open data portals: a few provinces participate in Vietnam's Open Government Data initiative and publish CSV/JSON. Where available we prefer this — cleaner structure.
  • Official administrative documents: for off-cycle price announcements we go directly to the signed PDF.

Every province page on this site links back to the source bulletin on the originating Department of Construction site — you can verify directly.

2. Crawler — fetching bulletins automatically

Each province has its own crawler plugin (every department publishes in a slightly different format). Plugins run on a daily cron: they poll the portal, detect new files, download to our internal archive with metadata (source URL, file hash, fetch timestamp, bulletin period embedded in the file).

When a crawler can't parse a row (new file format, columns moved), we log the failure in the crawl log instead of guessing. Failures are handled manually by our engineering team before prices are pushed live.

3. Classification — mapping each row to a material group

A Department of Construction bulletin contains hundreds of SKU rows with varied naming ("Yellow fill sand", "Fine masonry sand", "Yellow concrete sand"…). Our classifier maps each row into one of 15 main material categories using a most-specific-first rule chain:

  • Compound/secondary products (waterproofing, adhesives) → skip.
  • Specific phrases first (e.g. "cement sand" → sand, not cement).
  • Generic catch-alls last (e.g. plain "cement" → xi-mang).

The ruleset is tested against ~3,000 real-world samples from all 34 provinces; when an SKU name doesn't match any rule, the plugin logs it for inclusion in the next ruleset update.

4. Unit normalisation — single scale per category

The hardest part of VLXD aggregation: the same material is priced in different units across provinces (some report cement by tonne, others by 50 kg bag; rebar by bar in some, by kg in others). To make cross-province comparison possible we convert every price into the canonical unit per category:

Material groupCanonical unitConversion rule
Cementtonne1 × 50 kg bag = 0.05 tonne
RebarkgDrop "per bar" prices — density per diameter is required
Sand, stone, concreteDrop "per truck" — truck volume varies
Brickpiece1,000 pcs = 1,000 pcs (bulletins commonly price per 1,000)
Roofing, glass, tileSheet width × length
PaintkgCans/buckets converted to kg by nominal volume
Wire, pipemReels normalised to metres by reel length
Woodm² × thickness, or piece dimensions × count

When a row's source unit can't be converted safely (e.g. "concrete pile" → m³ requires per-section densities), we drop the row with a warning instead of producing a wrong number. The aggregated table won't contain it, and the crawl log records it so we can ask the Department of Construction to add a conversion factor in the next bulletin.

Technical rules live in vatgiatop-crawler/src/engine/unit-conversion.ts (internal source). Adding a new material group means extending CANONICAL_UNIT_BY_CATEGORY and shipping a TypeORM migration that back-fills historical data into the new unit.

5. Aggregation & publication

After classification and unit normalisation, we compute per (material × province × period) statistics: low, high, average. These are the numbers shown on province and material pages.

Each table always shows the original bulletin period (e.g. "Q2/2026", "April 2026"). If a provincial department hasn't released a new quarter yet, we keep the previous period and mark it clearly — we never falsely "refresh" using stale data.

When the crawler detects a new period, the table auto-updates within 24 hours. CDN caches are invalidated immediately on database write.

Margins of error and data limitations

Department of Construction prices are published reference prices, not actual dealer transaction prices. The two typically differ by 5–15% because of:

  • Freight from warehouse/plant to site (especially for bulky materials like sand, stone, ready-mix).
  • Discounts/promotions on order volume and timing (dealers often discount end-quarter to clear stock).
  • Raw-material input swings between bulletin periods — rebar can move 5–10% in a single week.
  • Contract vs ex-warehouse pricing at tier-2/3 dealers.

For that reason every price on Vật Giá Top is clearly labelled "for reference only" and we always recommend users verify with at least 3 local dealers before signing a contract. This is detailed in our Terms of Service.

What happens when you report an error

If you spot a wrong price (e.g. "PCB30 cement in Hà Nội at 7 M VND/tonne" when the real price is 1.5 M), email [email protected] with:

  • The page URL.
  • A screenshot of the wrong figure.
  • The correct price (if you know it) and your verification source (dealer catalog, contract).

Engineering verifies and fixes within 48 business hours. If the bug is a crawler-side misread of a Department of Construction bulletin, we patch the rule so other provinces don't hit the same trap.

Methodology updates

This page is reviewed at least every 6 months. When the process materially changes (new data source, changed canonical unit, new material category), the "Last updated" entry below changes with a brief changelog.

Last updated: May 25, 2026 — initial version.
Editorial owner: Vật Giá Top editorial team.
Technical (crawler) owner: Vật Giá Top engineering team (contact via the Contact page).

Related

Methodology — how we collect construction-material prices | Vật Giá Top