guide11 min read

Extracting Data from Mill Certificates: EN 10204 to Structured JSON

Turn EN 10204 mill certificates into structured data — composition, mechanical properties, and heat numbers as JSON or Excel, ready for PIM or DPP.

A mill certificate is one of the most data-dense documents in industry. A single EN 10204 inspection certificate carries the full chemical composition, mechanical properties, heat number, and test results for a batch of steel — and almost all of it is trapped in a PDF.

Every heat generates a new certificate. A fabricator or distributor receives hundreds of them, each in a different supplier's layout, often scanned, and increasingly needed in a structured form that a PIM, ERP, quality system, or Digital Product Passport can actually consume. Re-keying them by hand is slow, error-prone, and doesn't scale.

This article covers what a mill certificate actually contains, why the PDF blocks reuse, how to extract the data into a structured format, and where that structured data goes once you have it.

What a Mill Certificate Is (EN 10204)

“Mill certificate,” “mill test report,” “MTC,” and “inspection certificate” all refer to documents standardized by EN 10204, which defines four types of inspection document by how independently the stated properties were verified:

TypeDocumentVerified by
2.1Declaration of compliance with the orderManufacturer (no test results)
2.2Test reportManufacturer, non-specific test results
3.1Inspection certificateManufacturer’s authorised inspection representative — actual test results
3.2Inspection certificateCountersigned by an independent or customer representative

Most structural and pressure-equipment steel ships with a Type 3.1 certificate containing real measured values. The fields themselves are standardized by EN 10168, which defines the data sections of a steel inspection document: Section A (commercial data), Section B (product description — grade, form, delivery condition), Section C (chemical analysis and mechanical tests), Section D (additional tests), and Section Z (validation). That standardization is exactly what makes mill certificates a strong candidate for automated extraction.

The Data Hiding in a Type 3.1 Certificate

A single inspection certificate typically carries every one of these data points — the kind of information downstream systems and regulations increasingly demand at the product level:

Identification. Steel grade (EN 10027), material number, heat/cast number, product form, dimensions, delivery condition, applicable standard (EN 10025, EN 10088).
Chemical composition. Element percentages — C, Mn, Si, P, S, Cr, Ni, Mo, and more — for the ladle and/or product analysis.
Mechanical properties. Yield strength (ReH / Rp0.2), tensile strength (Rm), elongation (A%), Charpy impact energy (KV) at a stated temperature, and hardness.
Verification metadata. The EN 10204 certificate type itself (2.1 / 2.2 / 3.1 / 3.2), test references, and the inspector or representative who signed it.

Why the PDF Blocks Reuse

The data exists. The problem is its container. A mill certificate arrives as a PDF — sometimes digitally generated, often a scan — with values spread across tables, headers, and free-text fields, in a layout that differs from every other supplier's. A PIM, ERP, or DPP system cannot read it directly, and three things make manual handling fail at volume:

Table structure carries meaning. A chemical-analysis row only makes sense paired with its column header — flatten the table and the numbers lose their element labels.
Volume guarantees errors. Copy-pasting element percentages and strength values across hundreds of certificates a month is exactly the task humans get wrong.
Layouts and languages vary. Each mill formats differently and ships into multiple markets, so a brittle template-based parser breaks on the next supplier.

How to Extract Mill Certificate Data into Structured Form

The goal is to turn the certificate into structured fields — each property, value, and unit identified and labelled — that any downstream system can ingest. Position-aware extraction reads table structure from the document's coordinate data, identifies property-value-unit triples (so “Rm 510 MPa” is captured as a tensile strength of 510 megapascals, not three loose tokens), and outputs structured JSON, Excel, or JSON-LD — with each field scored for confidence and anchored to its exact source location so you can verify it against the original.

Because a certificate is a trust document, accuracy is the whole point. A quality audit pass catches the failure modes that matter on a mill certificate — a yield value that differs between the data table and the summary, a missing element in the chemical analysis, an out-of-range figure — before the data flows into your systems.

For steel producers & distributors

Turn a stack of mill certificates into structured data.

How EN 10204 / EN 10168 data maps to PIM, DPP, and CBAM fields, what extraction catches that copy-paste misses, and how the iron & steel DPP timeline is moving. One concrete email a month.

One email per month. Unsubscribe anytime.

Where Structured Certificate Data Goes

Once a certificate is structured, the same extracted data serves several systems that today each get re-keyed separately:

PIM / ERP and traceability. Material records keyed by heat number, searchable by grade and property — the backbone of quality traceability and stock management.
Digital Product Passport. For steel, the EN 10204 certificate type is itself a DPP attribute, and the composition and mechanical data populate the passport. See the iron & steel DPP requirements below.
CBAM and environmental reporting. Heat-level data underpins embedded-emissions and production-route reporting under the Carbon Border Adjustment Mechanism — the same structured source feeds both.

This is where extraction stops being a convenience and becomes leverage: one structured copy of the certificate, many downstream uses. For the full picture of how steel documentation maps to the coming passport requirements, see our guide to Digital Product Passports for iron & steel, and the JSON-LD export that maps extracted data to schema.org vocabulary.

Cross-Border Certificates

Steel moves across borders, and so do its certificates. When a certificate needs to be read or reissued in another market's language, the requirement is specific: standardized designations stay fixed (ReH is ReH in every language) while descriptive fields — delivery condition, notes, test descriptions — need accurate, domain-aware translation. A domain-aware translation pipeline that works on the already-structured data keeps the metallurgical terminology engineers recognize, rather than literal dictionary output.

The Data Is Already Yours

Steel producers and distributors already hold everything a PIM, a DPP, or a CBAM report needs — it lives in the mill certificates and inspection reports flowing through the business every day. The missing piece is the layer that turns those PDFs into structured, machine-readable, auditable data.

That is what SpecMake is built for. It reads your certificates — in any layout, scanned or generated, in any language — and extracts every property, value, unit, and standard reference into structured data, with a quality audit that catches errors before they propagate. SpecMake is the data preparation layer for DPP and downstream systems — pair it with the platform of your choice for hosting and registry submission.

The fastest way to see what that looks like on your own documents: take one mill certificate you received this week and run a readiness check. You'll see the structured fields, a confidence-scored extraction, and a DPP coverage score in under a minute — no account needed.

For the regulatory context, see our guides to Digital Product Passports for iron & steel and what the EU DPP registry is and when it goes live.

Is your steel documentation DPP-ready?

Upload a mill certificate and see the structured output plus a DPP coverage score — which required fields you have and what’s missing. No account needed.

Technical document tips, straight to your inbox

Practical guides on extraction, translation, and product data management for industrial teams.

One email per month. No spam. Unsubscribe anytime.