🤝

Contribute Unsolved Problems as Benchmarks

If an automated reasoning system fails to solve your input, contributing it as a benchmark helps the whole research community.

Welcome, and thank you for helping! 💛

Sometimes automated reasoning systems struggle — and that’s completely okay! If your input doesn’t get solved, contributing it as a benchmark is an incredibly generous way to help researchers and practitioners understand real-world challenges and make solvers better for everyone (SAT, PBO, MAXSAT, #SAT, SMT, ATP).

This page explains practical steps, what to include, and community-accepted formats for each subfield.

Why your contribution matters

Benchmarks drive progress — they reveal weaknesses and inspire algorithmic advances.
Public benchmarks enable reproducible evaluation and fair comparisons.
Real-world, unsolved instances are often more valuable than synthetic ones.

🧩 What to include with a contributed benchmark

The instance file in the standard format for the community (examples below).
Description, a brief description what the benchmarks is about, i.e., what it models, or even a link to paper or some other medium with more details
Minimal reproduction: the exact input you gave to the solver (no hidden preprocessing unless documented).
Solver invocation: command line, solver version, options, timeout, hardware used.
Observed behaviour: timeout, wrong result, crash, memory error, indefinite run.
Expected/desired outcome if known (satisfiable/unsatisfiable/counterexample/proof).
Metadata: origin, license, short description, tags (optimization / combinatorics / verification), difficulty estimate.
Optional: verification artifacts (proof traces, logs), model witness, or expected cost/value for optimization problems.

📚 Community-specific formats & clear statements

SAT (Boolean satisfiability)

Why it helps: SAT is a core technology used in scheduling, security, hardware and more. Hard cases move the whole field forward!

Meet the state-of-the-art SAT solvers at the SAT competition.

Submission: For SAT, submit instances in DIMACS CNF format (plain .cnf). Provide the exact file you fed the solver, your solver command line, timeout, and the solver's result (SAT/UNSAT/TIMEOUT/CRASH).

c Example comment lines start with 'c'
p cnf 6 9
1 -3 0
... (clauses) ...

PBO (Pseudo-Boolean Optimization)

Why it helps: Optimization problems show where solvers struggle to find the best solution, not just any solution.

Meet the state-of-the-art PBO solvers at the PB competition.

Submission: For PBO, provide the instance in the standard .opb (OPB) format or other community-accepted formats. Include the optimization objective and any assumptions used. State the expected optimality (if known) or that the optimum is unknown.

* #variable= 5 #constraint= 3
+1 x1 -2 x2 +3 x3 <= 4;
min: 2 x1 + 3 x2 + 1 x3;

MAXSAT

Why it helps: MAXSAT benchmarks highlight how solvers balance many conflicting goals.

Meet the state-of-the-art MAXSAT solvers at the MAXSAT evaluation.

Submission: For (Weighted) MAXSAT, use WCNF or CNF with soft/hard annotations. Provide weights, whether the instance is partial or weighted, and any target optimum if known.

p wcnf 4 6 10
1 1 -3 0
10 2 -4 0  (weight 10 soft clause)

#SAT (Model counting / Weighted model counting)

Why it helps: Counting solutions pushes reasoning into probabilistic and AI applications.

Meet the state-of-the-art model counters at the Model Counting competition.

Submission: For #SAT, submit the CNF instance, and indicate whether the count should be exact or approximate and whether it's weighted. If available, supply the known count or bounds, and the intended variable set to count over.

SMT (Satisfiability Modulo Theories)

Why it helps: SMT benchmarks test reasoning with arithmetic, bitvectors, arrays — essential in software and hardware verification.

Meet the state-of-the-art SMT solvers at the SMT competition.

Submission: For SMT, use SMT-LIB v2 format (.smt2). Declare logic (QF_BV, QF_UF, QF_LIA, etc.) and provide the exact file and invocation. If the formula is intended for satisfiability or for producing a model/proof, state that explicitly.

; example.smt2
(set-logic QF_BV)
(declare-fun x () (_ BitVec 32))
(assert (= x #x00000000))
(check-sat)
(get-model)

ATP (Automated Theorem Proving)

Why it helps: Testing ATP systems keeps mathematical reasoning systems trustworthy and ambitious.

Meet the state-of-the-art ATP systems at the CADE ATP System Competition (CASC).

Submission: For ATP (first-order logic, higher-order logic), use the TPTP format (.p, .tptp) and clearly label conjectures, axioms, and theory language (FOF, CNF, THF). Indicate expected result (theorem / countersatisfiable) and supply any existing proofs/traces if available.

% File: example.p
fof(ax1, axiom, (p(a))).
fof(conj, conjecture, (exists X: p(X))).

MiniZinc

Why it helps: push forward the boundaries in solving discrete optimisation problems.

Meet the state-of-the-art MiniZinc at the MiniZinc Challenge.

Submission: TODO.

XCSP3

Why it helps: push forward the boundaries in solving discrete decision and optimisation problems.

Meet the state-of-the-art XCSP solvers at the XCSP competition.

Submission: TODO.

Termination

Why it helps: Proving termination is essential in many areas like software verification, automated induction, theorem proving, ...

Meet the state-of-the-art termination tools at the Termination Competition.

Submission: Please create a pull request for this repository. For term rewrite systems and transition systems, please use the ARI format. For programming languages, you can check which fragments of the languages are supported here: C, Java Bytecode, Haskell, Prolog

✅ Practical checklist to attach to a submission

Instance file (in community format)
Description (a short text or a link to a detailed description)
Solver command used
Timeout & machine
Metadata (origin, license)
Optional logs

🔐 Licensing, privacy & sensitive data

Please ensure you are allowed to share the instance. If it contains sensitive or proprietary data, either anonymize it or do not share it publicly. A simple permissive license recommendation for benchmarks is CC0 or explicit public-domain dedication when possible; otherwise choose a license that allows research use (e.g. CC BY or CC BY-NC).

✍️ Example submission template (text you can paste)

Title: Short descriptive title
Description: (textual description, url)
Format: CNF / WCNF / OPB / SMT-LIB / TPTP
Filename: my-instance.cnf
Solver: name + version
Command: mysolver --timeout=600 my-instance.cnf
Observed: TIMEOUT after 600s (no solution), no crash
Expected: unknown (believed SAT)
Metadata: tags: scheduling, hardware-bug; license: CC0; author: Alice
Artifacts: solver.log (attached)

💛 All contributions are welcome

We know contributing a benchmark takes a bit of effort — and we deeply appreciate it. Your contribution, no matter how small, helps build systems that solve problems more reliably for everyone. Thank you for being part of a kind and collaborative community! 🌟

🤝 A friendly note for solver developers

If your solver times out or encounters a difficult instance, you can help build a kinder, more collaborative research ecosystem by suggesting users share those tough problems as benchmarks.

Consider adding a friendly message in your solver’s timeout logs or output, such as:

Timeout reached! 😅 This instance is a real thinker.
Please consider contributing it to improve automated reasoning: mysolvertimesout.org

The following solvers support this initiative: Sat4j.

About this initiative

This web page is maintained by Daniel Le Berre. Contributions are welcome using either issues or pull requests. A discussion space is also available.

This initiative emerged from the discussions during the Dagstuhl Seminar 25441 on Competitions and Empirical Evaluations in Automated Reasoning.