Step 00 / 16
Loading basin geometry
Project 03 · Subsurface · Walk 06 (finale)

Walking the Eagle Ford.

Five basins and 1.92 million wells later, this is the basin we cannot show you. Texas Railroad Commission ships its bulk well data through a JSF session portal backed by EBCDIC mainframe dumps. The wells exist. The architecture decided we cannot count them.

01 · The question this basin answers

What is dark data, exactly?

The first five basins answered "how much": Williston 46% dark, Permian 78%, Anadarko 91%, Kansas 94%, Appalachia 95%. The numbers settled into a pattern. The shapes settled into a verdict. The pattern said horizontal eras don't rescue old basins; the verdict said most US oil and gas data is structurally dark.

Eagle Ford is the basin that answers the deeper question. Not how much is dark, but what dark IS. We chose this basin last on purpose. The architecture itself is the answer.

By the time we publish, the other five regulators have done their job: ship the wells in a form we can plot. TX RRC is the regulator that didn't.

02 · The basin

USGS Eagle Ford Group, 76 South Texas counties.

The 2018 USGS Eagle Ford Group Assessment ships seven Continuous AU polygons covering the Cenomanian-Turonian Mudstone, the Eagle Ford Marl, and the Submarine Plateau Karnes Trough plays. They span 76 Texas counties from Maverick on the Mexico border up through Karnes and DeWitt and out to Brazos and Madison. The map you are looking at is the basin and the counties.

What is missing from this map: the wells. EIA estimates more than 30,000 horizontal wells permitted in the Eagle Ford since 2008, plus a half-million conventional wells of every vintage drilled around them. None of them are dots on this page.

Compare to Williston, where we plotted 45,921 wells inside the basin polygon, or Anadarko, where we plotted 482,918. Same series, same code, same intent. Eagle Ford ships empty.

03 · The data hunt

Five attempts, five dead ends.

We probed the obvious paths first. TX RRC's ArcGIS REST host (gis2.rrc.texas.gov) is unreachable from this network. The HTML viewer at gis.rrc.texas.gov requires a Silverlight-era control. The HIFLD national oil/gas wells layer requires an ArcGIS Online token. The Texas Open Data Portal has no oil/gas wells dataset. The TX Bureau of Economic Geology does not publish a public well shapefile.

HEAD https://gis2.rrc.texas.gov/... network unreachable
HEAD https://gis.rrc.texas.gov/files/OG_WELLS.zip 404
HEAD https://services.arcgis.com/.../Oil_and_Natural_Gas_Wells Token Required

Each one of these had a public PA, OK, ND, KS, or WV equivalent that ships in a single click. TX did not.

04 · The JSF wall

The bulk-data portal has a session timer.

TX RRC's MFT (Managed File Transfer) portal lists more than 200 direct download links to oil-and-gas datasets: Statewide API Data, Drilling Permit Master, Wellbore Query Data, Completion Information in Data Format. We tried them all.

Each link returns a JSF page with a five-minute session timer instead of a file. The page is generated by a PrimeFaces-stack web app. Download requires the session cookie, the JSF ViewState token, and a click on a control that POSTs back to the same JSF state. Vanilla HTTP cannot fetch the file.

HEAD https://mft.rrc.texas.gov/link/d551fb20-... 200 OK
Content-Type: text/html;charset=UTF-8 (not the file)
<title>RRC Web Client - GoDrive</title>

We do not run Selenium scrapers on regulator portals at scale. That is not a methodological purity claim; it is a cost claim. Each basin in this series shipped in 3 to 5 hours of pull, parse, and join. Texas would be 2 to 3 days of fragile session-replay code.

05 · The EBCDIC layer

The older records are mainframe.

Behind the JSF portal sits an even older layer: EBCDIC fixed-width files originally produced for an IBM mainframe. The TX RRC dataset list shows EBCDIC versions of Statewide Oil Production, Statewide Gas Production, Historical Ledger Statewide Oil, Statewide Gas Ledger Districts 1 through 10, and Certificate of Authorization P-4. The ASCII versions are companions, not replacements.

You can decode EBCDIC. You can write a fixed-width parser. The economics of doing it for a research-journal basin walk do not pencil out, especially when North Dakota ships its entire well inventory as a 43 MB shapefile that opens in QGIS in three seconds.

"The state did not migrate" is the cleanest way to describe what TX RRC is. Other regulators upgraded their data publishing posture between 1995 and 2015. TX kept the mainframe.

06 · The cooperative regulators

Five basins shipped. One walled.

Kansas
KGS
basin 01 94.3% dark Permian
TX RRC + NM OCD
basin 02 77.9% dark Williston
NDIC + MT BOGC
basin 03 45.7% dark Anadarko
OCC + KGS
basin 04 91.1% dark Appalachia
PA DEP + WV DEP + OH DOG
basin 05 94.8% dark Eagle Ford
TX RRC
basin 06 unmeasured

The Permian (basin 02) is the only basin that crossed Texas and survived. It survived because we relied on the New Mexico OCD dataset for the lit population, and because the eastern Delaware basin counties carry enough metadata to make the dark-share estimate without a TX RRC join.

07 · The accessibility ranking

Six basins, nine regulators, one outlier.

ND NDIC100
100 OK OCC100
100 PA DEP95
95 KS KGS90
90 NM OCD80
80 MT BOGC75
75 OH DOG75
75 WV DEP50
50 TX RRC5
5

Eight regulators publish bulk well data we can pull. One does not. The cooperative regulators are not all equally polished, but every one of them shipped the wells. The outlier is not on a continuum with the rest.

Score is a working analyst's rating: 100 = direct CMS download, no auth, no session. 0 = inaccessible without browser automation. The full table is in regulatory_comparison.json.

08 · What we know anyway
279

Rigs in May 2012.

EIA's Drilling Productivity Report aggregates Eagle Ford rig count and production at the region level. We can see the rig count rise from under 50 in 2008 to a peak of 279 in May 2012. Oil production followed, peaking at 1.72 million barrels per day in March 2015. The 2014-2016 oil price collapse cut the rig count to 38 by 2016, but production only fell from 1.72 to 1.05 MMbbl/d. The wells that were already drilled kept producing.

The wells we cannot see produced 1.2 to 1.7 million barrels of oil a day for ten years. They are real. They are documented somewhere inside TX RRC. We just cannot pull them.

09 · The boom shape

2008 to 2015, then a plateau.

Eagle Ford's rig count went from 13 to 279 in 50 months. That is the fastest rig-count buildup of any basin in the EIA DPR data. The Marcellus took 60 months to add as many; the Williston took 70.

Production peaked in March 2015 at 1.72 MMbbl/d, fell to 1.05 in late 2016, and has held a 1.0 to 1.3 MMbbl/d plateau since 2018 with progressively fewer rigs needed to sustain it. The plateau is the operational success of the play. The plateau is also the record we cannot index.

A successful basin can also be a dark basin. The two are not in tension.

10 · What we cannot show

Per-well lat, lon, vintage, dark/lit.

For each of the other five basins we ship a CSV of every basin-clipped well, six columns, 25 to 580 thousand rows. Eagle Ford has no equivalent. We chose this basin last on purpose, with the working assumption that we would have to ship without it. We confirm that here.

What we cannot show: where the wells are. What vintage they are. Which operator drilled which one. Which ones were horizontals targeting the Eagle Ford Marl in Karnes and which ones were 1960s vertical Wilcox wells in Live Oak.

If TX RRC publishes a bulk well shapefile, this basin gets a v2 with a wells.csv. The pipeline is built. The slot is empty.

11 · The pre-digital floor

South Texas was drilled long before 2008.

The Eagle Ford horizontal era started in 2008 with EOG's first wells in La Salle County. The Wilcox sandstone gas plays of Live Oak and Bee counties date to the 1940s. Spraberry-Wilcox conventional oil in McMullen and Atascosa goes back to the 1930s. Frio gas in the southern counties has been drilled since the 1920s.

The 30,000 Eagle Ford horizontals are the lit layer. The 470,000 conventional wells underneath them are the dark floor. We know the floor exists from EIA aggregate counts and from individual operators' SEC filings. We do not know it well-by-well.

If you assume Eagle Ford follows the Anadarko shape (8% modern horizontals as a top layer, 92% conventional vertical legacy underneath), the basin's expected dark share is in the high 80s. We document the assumption, not the verdict.

12 · What dark data IS

Architecture is the verdict.

The first five basins demonstrated that dark data is a measurable property of a basin's drilling history. The Williston's lit layer is genuinely big. Anadarko's is genuinely small. Kansas's pre-digital legacy is genuinely overwhelming.

This basin demonstrates the second property. Dark data is also a measurable property of the regulator. The drilling history of the Eagle Ford could be perfectly recorded inside TX RRC's databases (it largely is) and still be dark to the public, the academic community, the underwriting industry, and the policy teams that need it. The wells exist. The architecture is the question.

The dark-data problem is not solved by waiting for technology. It is not solved by waiting for the operators to file better. It is solved by the regulator choosing to ship.

13 · What this basin is for

Six basins is enough to finalize the thesis.

Five measurable basins gave us a falsifiable claim: horizontal eras only rescue young basins where horizontals dominate the well count. Williston (46% dark) confirms it. Permian (78%), Anadarko (91%), Kansas (94%), Appalachia (95%) refute the alternative. The thesis holds across every basin where we can see the wells.

One unmeasurable basin gives us the second claim: the dark-data problem is structural. It does not respond to drilling-era timing. It responds to the regulator's data publishing choice. Texas chose. The choice is the basin.

14 · The series, one chart

Six basins, 1.92 million wells, one Texas.

Williston2010+ horizontal era dominant46% PermianNM lit, TX dark, Delaware basin core78% AnadarkoSCOOP/STACK on a 60-year vertical legacy91% KansasPre-2000 vertical conventional, no horizontal era94% AppalachiaDrake 1859, 165 years, three regulators95% Eagle FordTX RRC: JSF + EBCDIC, no public bulk pullopaque

The pattern is in the gap. The five legible basins land between 46% and 95% dark, set by drilling-era timing. The sixth basin lands at "we cannot see," set by regulatory choice. Both are dark-data verdicts. Both are real. Neither is in tension with the other.

15 · The tell

Dark data is a choice the regulator made.

The first five basins showed what the choice looked like under different drilling histories. Williston, where the horizontal era was big enough to overwhelm the legacy. Anadarko, where it was not. The output of the choice changes with the geology and with the calendar.

Eagle Ford shows what the choice looks like when it has been made. North Dakota chose a shapefile. Oklahoma chose a CSV. Pennsylvania chose a flag column. Kansas chose a survey. Texas chose to keep the mainframe.

That last sentence is the thesis of the entire series. We needed six basins to write it.

The series in one sentence.

The US oil and gas industry's relationship to its own data is not a continuum of technical capacity. It is a binary product of regulatory choice, propagated through a calendar of drilling-era timing. Five regulators chose to publish what they had. One did not. The first five let us measure the dark fraction at varying levels (46 to 95 percent) and prove that horizontal eras only rescue the basins where horizontals dominate the well count. The sixth lets us prove that the dark-data problem is not a technology gap or an operator failure. It is a regulator's published posture. Drake's well went dark in 1861 because there was no system to remember it. The Eagle Ford's wells are dark in 2026 because the system that does remember them does not let anyone outside the building look in.

Where this series goes

Six basins, 1.92 million measurable wells, one architectural wall, one falsifiable thesis. The series ends here on the substance side. The work it points to begins in the rest of the country.

  • For operators & modellers. Type-curve work in any basin in this series rests on the lit fraction documented here. The dark fraction sits in PDFs, scanned tax forms, EBCDIC mainframe dumps, and operator filing-cabinet rooms that no one has opened since 1992. Those wells carry the stratigraphic context the modern wells were optimized against. They are not findable. They are findable.
  • For acquirers & underwriters. Plug-and-abandon liability in the legacy fractions, especially in Texas, is the canonical environmental price of inheriting a basin. The price is being calculated against an inventory that does not exist publicly. That is a market problem.
  • For regulators & policy teams. The five publishers spent years building data architectures that ship wells. The cost of that decision was real. The benefit, six years later, is six basin walks, one open thesis, and the hard data underneath it. Texas is a publishing decision away from joining.

I build the pipelines, basin-clip analyses, and AI-assisted subsurface QC that close this kind of gap, across basins, regulators, and the geological detail that still matters. If you work any side of this, let's talk.

Sources: USGS ScienceBase item 5d1246c2: Eagle Ford Group Assessment Unit boundaries. EIA Drilling Productivity Report Eagle Ford monthly time series, January 2007 onward. US Census TIGER 2024 cartographic boundary files for state and county outlines. Regulatory accessibility scores reflect the experience of pulling each basin in this six-basin series, January 2026 to April 2026. Built with deck.gl and scrollama. Analysis: @salamituns.