One-sentence summary
A daily crawler downloads PDF / Excel price bulletins from each provincial Department of Construction → a rules-based classifier maps every line into one of 15 material groups → every price is converted into the canonical unit per TCVN standards → lines that can't be safely converted are dropped → results are published alongside the original bulletin period so users can cross-check against the source.
1. Data sources
Every price on Vật Giá Top is aggregated from the official VLXD price bulletins of the 34 provincial Departments of Construction. These are public records under Ministry of Construction Circular 11/2021/TT-BXD and subsequent amendments, published by each provincial department either quarterly or monthly.
Three concrete source layers:
- Provincial Department of Construction portals: each province has a published page (e.g. soxaydung.hanoi.gov.vn, sxd.hochiminhcity.gov.vn). Most release a PDF or Excel file with a digitally signed bulletin.
- Open data portals: a few provinces participate in Vietnam's Open Government Data initiative and publish CSV/JSON. Where available we prefer this — cleaner structure.
- Official administrative documents: for off-cycle price announcements we go directly to the signed PDF.
Every province page on this site links back to the source bulletin on the originating Department of Construction site — you can verify directly.
2. Crawler — fetching bulletins automatically
Each province has its own crawler plugin (every department publishes in a slightly different format). Plugins run on a daily cron: they poll the portal, detect new files, download to our internal archive with metadata (source URL, file hash, fetch timestamp, bulletin period embedded in the file).
When a crawler can't parse a row (new file format, columns moved), we log the failure in the crawl log instead of guessing. Failures are handled manually by our engineering team before prices are pushed live.
3. Classification — mapping each row to a material group
A Department of Construction bulletin contains hundreds of SKU rows with varied naming ("Yellow fill sand", "Fine masonry sand", "Yellow concrete sand"…). Our classifier maps each row into one of 15 main material categories using a most-specific-first rule chain:
- Compound/secondary products (waterproofing, adhesives) → skip.
- Specific phrases first (e.g. "cement sand" → sand, not cement).
- Generic catch-alls last (e.g. plain "cement" → xi-mang).
The ruleset is tested against ~3,000 real-world samples from all 34 provinces; when an SKU name doesn't match any rule, the plugin logs it for inclusion in the next ruleset update.
4. Unit normalisation — single scale per category
The hardest part of VLXD aggregation: the same material is priced in different units across provinces (some report cement by tonne, others by 50 kg bag; rebar by bar in some, by kg in others). To make cross-province comparison possible we convert every price into the canonical unit per category:
| Material group | Canonical unit | Conversion rule |
|---|---|---|
| Cement | tonne | 1 × 50 kg bag = 0.05 tonne |
| Rebar | kg | Drop "per bar" prices — density per diameter is required |
| Sand, stone, concrete | m³ | Drop "per truck" — truck volume varies |
| Brick | piece | 1,000 pcs = 1,000 pcs (bulletins commonly price per 1,000) |
| Roofing, glass, tile | m² | Sheet width × length |
| Paint | kg | Cans/buckets converted to kg by nominal volume |
| Wire, pipe | m | Reels normalised to metres by reel length |
| Wood | m³ | m² × thickness, or piece dimensions × count |
When a row's source unit can't be converted safely (e.g. "concrete pile" → m³ requires per-section densities), we drop the row with a warning instead of producing a wrong number. The aggregated table won't contain it, and the crawl log records it so we can ask the Department of Construction to add a conversion factor in the next bulletin.
Technical rules live in vatgiatop-crawler/src/engine/unit-conversion.ts (internal source). Adding a new material group means extending CANONICAL_UNIT_BY_CATEGORY and shipping a TypeORM migration that back-fills historical data into the new unit.
5. Aggregation & publication
After classification and unit normalisation, we compute per (material × province × period) statistics: low, high, average. These are the numbers shown on province and material pages.
Each table always shows the original bulletin period (e.g. "Q2/2026", "April 2026"). If a provincial department hasn't released a new quarter yet, we keep the previous period and mark it clearly — we never falsely "refresh" using stale data.
When the crawler detects a new period, the table auto-updates within 24 hours. CDN caches are invalidated immediately on database write.
Margins of error and data limitations
Department of Construction prices are published reference prices, not actual dealer transaction prices. The two typically differ by 5–15% because of:
- Freight from warehouse/plant to site (especially for bulky materials like sand, stone, ready-mix).
- Discounts/promotions on order volume and timing (dealers often discount end-quarter to clear stock).
- Raw-material input swings between bulletin periods — rebar can move 5–10% in a single week.
- Contract vs ex-warehouse pricing at tier-2/3 dealers.
For that reason every price on Vật Giá Top is clearly labelled "for reference only" and we always recommend users verify with at least 3 local dealers before signing a contract. This is detailed in our Terms of Service.
What happens when you report an error
If you spot a wrong price (e.g. "PCB30 cement in Hà Nội at 7 M VND/tonne" when the real price is 1.5 M), email [email protected] with:
- The page URL.
- A screenshot of the wrong figure.
- The correct price (if you know it) and your verification source (dealer catalog, contract).
Engineering verifies and fixes within 48 business hours. If the bug is a crawler-side misread of a Department of Construction bulletin, we patch the rule so other provinces don't hit the same trap.
Methodology updates
This page is reviewed at least every 6 months. When the process materially changes (new data source, changed canonical unit, new material category), the "Last updated" entry below changes with a brief changelog.
Related
- About Vật Giá Top — product and team overview.
- Terms of Service — especially §3 on the reference nature of prices.
- Privacy Policy — data we collect from users (separate from Department of Construction price data).
- VLXD API — public endpoints exposing the normalised price data.