Olatunde Salami Dark Data · Collection
Analysis / Subsurface / Series

Where US oil data actually lives.

Industry says 99.5% of subsurface data is dark. That number is true and useless, too abstract to act on. This is a basin-by-basin walk through what the US oil and gas industry can actually see of its own history: every drilled well in a state, classified by whether a machine-readable log survived.

One basin at a time. Same method, same palette, same cuts, so the shape of the gap can be compared across geology and century.

Map of the contiguous United States with six oil and gas basins outlined in distinct colors: Williston in violet across North Dakota and Montana, Permian in amber across West Texas and New Mexico, Anadarko in teal across Oklahoma, Kansas in clay, Appalachia in olive across Pennsylvania, West Virginia, and Ohio, and Eagle Ford in slate across South Texas.
New · Field guide

Start with the basins.

A reader's primer to the six US oil and gas basins covered in this series. Where they are. What rock they target. When they were drilled. Click any basin to inspect, then dive into its full dark-data walk.

6basins 5view modes deck.gl + maplibre
Open the primer

Running Totals

6
Basins shipped
1,921,013
Wells mapped
89.1%
Dark measured
1
Architectural wall
The thesis, on one chart

Horizontal eras only rescue young basins.

Five measurable basins, two axes. Y axis: percent dark. X axis: share of the basin's well count drilled in the post-2008 horizontal era. The pattern is monotonic. Williston is the only basin where horizontals dominate the well count, and it is the only basin in the lit majority. Eagle Ford sits off the chart because TX RRC's data architecture would not let us measure it.

Scatter plot. Y axis: percent dark, 35 to 100 percent. X axis: horizontal-era share of basin well count, 0 to 55 percent. Six points: Williston at 47 percent horizontal share and 46 percent dark, Permian at 28 percent and 78 percent dark, Anadarko at 6 percent and 91 percent dark, Kansas at 0 percent and 94 percent dark, Appalachia at 5 percent and 95 percent dark. A dashed trend line drops from 97 percent dark at zero horizontal share to 40 percent dark at 50 percent horizontal share. A side panel notes that Eagle Ford is unmeasured because TX RRC does not publish bulk well data, and that the predicted dark share for Eagle Ford given EIA's well-count estimates would land in the high eighties.
The single image of the dark-data series. Five basins prove the rule, the sixth proves the architectural exception. The full breakdown lives in the basin walks below; the primer at /darkdata/basins.html lets you toggle the same six basins through five different lenses.

The Basins

Kansas, Mid-Continent shelf

419,777 drilled wells. 23,897 with a public LAS. The other 94.3% are dark. A century-wide east–west gradient from the 1920s shallow plays to the 2010s horizontal boom, with 46% of the dark universe sitting under operators that no longer exist.

KGS master list1920s–todayChoropleth · Cherokee close-upApril 2026
Live

Permian, West Texas & SE New Mexico

393,073 drilled wells across TX RRC and NM OCD. 22.1% are reachable to a public analyst; the other 77.9% are dark. The lit wells concentrate in the Midland Basin core and Eddy County, the exact footprint of the horizontal Wolfcamp–Bone Spring campaign. 15,601 NM wells belong to PRE-ONGARD, a placeholder for operators that never migrated to the digital era. Dark data is what an industry stops caring about.

TX RRC + NM OCD1920s–todayMidland · Delaware · ShelfApril 2026
Live

Williston, Bakken & Three Forks

45,921 wells clipped to the USGS Bakken & Three Forks TPS polygon across North Dakota and Montana. 54.3% lit, 45.7% dark: the lowest dark share in the series so far. The 2010+ horizontal Bakken carried the basin to the majority-lit side; everything pre-2000 is still paper. The dark share inverts at the state line (ND 43.3%, MT 64.9%) because Montana has no horizontal flag to lean on.

NDIC + MT BOGC + USGS TPS1950s–todayBakken core · Elm CouleeApril 2026
Live

Anadarko, SCOOP/STACK & Hugoton

482,918 wells clipped to the USGS Anadarko Basin Province polygon across Kansas and Oklahoma. 8.9% lit, 91.1% dark: the highest dark share in the series so far. SCOOP and STACK lit a thin south-rim arc (27,396 OK horizontals), but it sits on top of a quarter-million pre-1980 vertical wells in the Hugoton Embayment and the Anadarko Shelf. The Texas Panhandle slice of the basin is excluded; deferred to basin 06.

OK OCC + KGS + USGS Province 0581900s–todaySCOOP · STACK · HugotonApril 2026
Live

Appalachia, Marcellus & Utica

579,324 wells clipped to the EIA Appalachian shale envelope across Pennsylvania, West Virginia, and Ohio. 5.2% lit, 94.8% dark: the highest dark share in the series, in the basin Drake spudded in 1859. PA's UNCONVENTI flag captures 23,237 Marcellus horizontals; OH's SLANT field catches 4,670 Utica wells; WV is 100% dark by data architecture (its 2016 well-location file and its 2024 H6A horizontal-production roll do not link). 165 years of vertical legacy overwhelms the modern era.

PA DEP + WV DEP + OH DNR + EIA1859–todayMarcellus · Utica · DrakeApril 2026
Live

Eagle Ford, South Texas (finale)

The series finale. 7 USGS Eagle Ford Group Assessment Units across 76 South Texas counties, ~30,000 horizontal wells permitted since 2008, ~500,000 conventional wells of every vintage drilled around them, and zero of them are pullable as a public bulk dataset. TX Railroad Commission ships its bulk well data through a JSF/PrimeFaces session portal backed by EBCDIC mainframe dumps. The architecture itself is the dark-data verdict. This basin ships without per-well data on purpose.

USGS Eagle Ford AUs + EIA DPR2007–today (production aggregate)Karnes · Maverick · Live OakApril 2026
Live
Dark data is a stratigraphy of time. Every basin has a different century missing.

Method: the same four cuts in every basin

01 · The state list

Pull the regulator’s master inventory of drilled oil & gas wells. Every state has one; the field coverage varies.

02 · The log archive

Cross-reference the public LAS/digital-log index. The gap between list and archive is the dark universe for that basin.

03 · The time cut

Classify every well by spud decade. The coverage curve almost always reveals the year LAS became the default.

04 · The operator cut

Join to the current operator-of-record. The orphan share (wells whose last operator no longer exists) is where recovery is hardest.

05 · The play close-up

One zoom per basin, into the biggest concentrated play. Same palette, same cuts. Lets readers compare Cherokee vs Wolfcamp vs Bakken on the same frame.

06 · Open source

Every basin ships with the raw CSV, the notebook, and the map source. If the method stops being the constant, this stops being this project.

Why This Series

Every conversation about AI in the subsurface assumes the training data exists. The maps in this collection ask a slower question first: where does the data actually live, and what shape does the gap take?

If the answer for your basin is “mostly dark, mostly orphaned, mostly pre-1990,” that’s a different product than the one the demos are selling. The series is an honest inventory, basin by basin.