Sometimes automated reasoning systems struggle β and thatβs completely okay! If your input doesnβt get solved, contributing it as a benchmark is an incredibly generous way to help researchers and practitioners understand real-world challenges and make solvers better for everyone (SAT, PBO, MAXSAT, #SAT, SMT, ATP).
This page explains practical steps, what to include, and community-accepted formats for each subfield.
Why it helps: SAT is a core technology used in scheduling, security, hardware and more. Hard cases move the whole field forward!
Meet the state-of-the-art SAT solvers at the SAT competition.
Submission: For SAT, submit instances in DIMACS CNF format (plain .cnf). Provide the exact file you fed the solver, your solver command line, timeout, and the solver's result (SAT/UNSAT/TIMEOUT/CRASH).
c Example comment lines start with 'c' p cnf 6 9 1 -3 0 ... (clauses) ...
Why it helps: Optimization problems show where solvers struggle to find the best solution, not just any solution.
Meet the state-of-the-art PBO solvers at the PB competition.
Submission: For PBO, provide the instance in the standard .opb (OPB) format or other community-accepted formats. Include the optimization objective and any assumptions used. State the expected optimality (if known) or that the optimum is unknown.
* #variable= 5 #constraint= 3 +1 x1 -2 x2 +3 x3 <= 4; min: 2 x1 + 3 x2 + 1 x3;
Why it helps: MAXSAT benchmarks highlight how solvers balance many conflicting goals.
Meet the state-of-the-art MAXSAT solvers at the MAXSAT evaluation.
Submission: For (Weighted) MAXSAT, use WCNF or CNF with soft/hard annotations. Provide weights, whether the instance is partial or weighted, and any target optimum if known.
p wcnf 4 6 10 1 1 -3 0 10 2 -4 0 (weight 10 soft clause)
Why it helps: Counting solutions pushes reasoning into probabilistic and AI applications.
Meet the state-of-the-art model counters at the Model Counting competition.
Submission: For #SAT, submit the CNF instance, and indicate whether the count should be exact or approximate and whether it's weighted. If available, supply the known count or bounds, and the intended variable set to count over.
Why it helps: SMT benchmarks test reasoning with arithmetic, bitvectors, arrays β essential in software and hardware verification.
Meet the state-of-the-art SMT solvers at the SMT competition.
Submission: For SMT, use SMT-LIB v2 format (.smt2). Declare logic (QF_BV, QF_UF, QF_LIA, etc.) and provide the exact file and invocation. If the formula is intended for satisfiability or for producing a model/proof, state that explicitly.
; example.smt2 (set-logic QF_BV) (declare-fun x () (_ BitVec 32)) (assert (= x #x00000000)) (check-sat) (get-model)
Why it helps: Testing ATP systems keeps mathematical reasoning systems trustworthy and ambitious.
Meet the state-of-the-art ATP systems at the CADE ATP System Competition (CASC).
Submission: For ATP (first-order logic, higher-order logic), use the TPTP format (.p, .tptp) and clearly label conjectures, axioms, and theory language (FOF, CNF, THF). Indicate expected result (theorem / countersatisfiable) and supply any existing proofs/traces if available.
% File: example.p fof(ax1, axiom, (p(a))). fof(conj, conjecture, (exists X: p(X))).
Why it helps: push forward the boundaries in solving discrete optimisation problems.
Meet the state-of-the-art MiniZinc at the MiniZinc Challenge.
Submission: TODO.
Why it helps: push forward the boundaries in solving discrete decision and optimisation problems.
Meet the state-of-the-art XCSP solvers at the XCSP competition.
Submission: TODO.
Please ensure you are allowed to share the instance. If it contains sensitive or proprietary data, either anonymize it or do not share it publicly. A simple permissive license recommendation for benchmarks is CC0 or explicit public-domain dedication when possible; otherwise choose a license that allows research use (e.g. CC BY or CC BY-NC).
Title: Short descriptive title
Description: (textual description, url)
Format: CNF / WCNF / OPB / SMT-LIB / TPTP
Filename: my-instance.cnf
Solver: name + version
Command: mysolver --timeout=600 my-instance.cnf
Observed: TIMEOUT after 600s (no solution), no crash
Expected: unknown (believed SAT)
Metadata: tags: scheduling, hardware-bug; license: CC0; author: Alice
Artifacts: solver.log (attached)
      
    We know contributing a benchmark takes a bit of effort β and we deeply appreciate it. Your contribution, no matter how small, helps build systems that solve problems more reliably for everyone. Thank you for being part of a kind and collaborative community! π
If your solver times out or encounters a difficult instance, you can help build a kinder, more collaborative research ecosystem by suggesting users share those tough problems as benchmarks.
Consider adding a friendly message in your solverβs timeout logs or output, such as:
Timeout reached! π This instance is a real thinker. Please consider contributing it to improve automated reasoning: mysolvertimesout.org
The following solvers support this initiative: Sat4j.
This web page is maintained by Daniel Le Berre. Contributions are welcome using either issues or pull requests. A discussion space is also available.
This initiative emerged from the discussions during the Dagstuhl Seminar 25441 on Competitions and Empirical Evaluations in Automated Reasoning.