Edwin Drake spudded the first commercial oil well in North America in Titusville, Pennsylvania, in 1859. The basin has been drilled, logged, plugged, paved over, and re-drilled for 165 years since. We mapped what its three regulators can still see.
Williston was 46% lit because horizontals dominated its well count. Anadarko was 91% dark because horizontals were a thin top layer. The Appalachian Basin is the extreme case. The Marcellus and Utica horizontal eras dropped a 30,000-well lit layer onto a vertical legacy that started at Drake and never stopped.
If the Bakken-style rescue scales to age, Appalachia should be in the high eighties dark. If 165 years of paper-era drilling overwhelms even the modern era, it ends up worse.
We clip to the EIA Appalachian Basin shales: the union of Marcellus, Utica, Devonian (Ohio), and Chattanooga shale polygons, published December 2021. It covers 173,000 square miles across PA, WV, OH, NY, KY, TN, MD, and VA.
For v1 we publish PA + WV + OH, where 95% of the basin's drilling activity sits. New York's moratorium kept the play out. Kentucky and Tennessee carry under 5,000 wells each in the polygon. Maryland and Virginia carry slivers. They are documented but not in this cut.
USGS ScienceBase was timing out on every Appalachian Basin Province query the night this was built. The EIA shale envelope is the working substitute. It traces the same play boundaries.
The PA DEP publishes its full well inventory through PASDA: a 24 MB shapefile carrying API, operator, well type, well configuration, spud date, and a single column called UNCONVENTI: Y or N. Marcellus and Utica horizontals are Y. Everything else is N. There is no proxy in PA. The regulator decided.
A second PA DEP file ships 30,527 historic wells digitized from WPA mapping efforts and the USGS K Sheet and H Sheet. Drake-era through pre-1900s drilling, mostly in the northwest. All counted dark, all bucketed pre-1980.
PA is the only state in this series whose regulator publishes its own dark/lit answer. The other states need a proxy. PA hands you the column.
WV DEP publishes a 2016 well location file: 114,259 records with PERMIT_ID, operator, county, well status, UTM coordinates. The file uses 6-digit permit numbers as the primary key.
WV DEP also publishes a 2024 Q4 horizontal H6A production roll: 3,305 wells flagged horizontal Marcellus or deep Devonian, keyed on 10-digit API numbers.
The two files do not share an identifier. The state does not publish a cross-reference. That is the dark-data verdict for West Virginia: the well-location data and the horizontal-production data both exist, and both are useful, and neither can be joined to the other by anyone outside the agency.
We count WV at 100% dark on this map. That number is not a measurement of paper. It is a measurement of database architecture.
OH DNR's Division of Oil and Gas Resources Management publishes its public well layer as a queryable ArcGIS REST endpoint at gis.ohiodnr.gov. No bulk ZIP. We pull it 1,000 records at a time across 243 paginated requests.
The OH layer ships a SLANT field with single-letter codes (V, H, D, O) and depth flags for Marcellus_Shale and Utica_Shale formation tops. Lit means SLANT=H or one of the formation depths is non-zero. 4,670 OH wells out of 242,035 clear that bar.
Ohio's Utica horizontal era began around 2011, lagging the PA Marcellus by three years. The state's lit signal is concentrated in a four-county arc through Carroll, Harrison, Belmont, and Monroe.
PA dark = UNCONVENTI is N AND well configuration is not horizontal, OR well is in the historic WPA inventory.
WV dark = present in 2016 well location data (a state-level architecture verdict, not a paper-records verdict).
OH dark = SLANT is not H AND no Marcellus/Utica formation depth recorded.
Asymmetric proxies, three different verdicts, but the basin-wide answer converges anyway. That convergence is the headline.
610,821 wells across the three states fall inside their own borders. 579,324 fall inside the EIA Appalachian shale union: 254,130 in Pennsylvania, 109,666 in West Virginia, 215,528 in Ohio. The largest basin in the series.
Outside the polygon is mostly central PA Allegheny Plateau (older Devonian sandstone gas, less drilled), eastern OH Niagara reef structures, and the deep WV Appalachian Plateau. We exclude them so the map is the unconventional petroleum system, not the conventional fringe.
549,432 of 579,324 wells. Higher than Anadarko's 91%. Higher than Permian's 78%. Within striking distance of Kansas's 94%. The 165 years of vertical drilling overwhelms the Marcellus + Utica era. The horizontal rescue does not scale to age.
If we removed WV from the calculation entirely (the state is dark by data architecture, not by record), the basin lands at 93.5%. Either way, the answer is the same kind of number.
Pennsylvania has the most lit signal in the basin. The PA DEP UNCONVENTI flag captures roughly 25,000 Marcellus and Utica horizontals, with the Marcellus core running through Bradford, Tioga, Lycoming, and Susquehanna counties.
Ohio's Utica era is real but small. 4,670 lit wells in a state with 242,035 total. WV's 100% is the architectural verdict, not a paper count.
Edwin Drake's well was 69 feet deep, drilled with a steam-powered cable tool, and produced 25 barrels a day. It established the first commercial petroleum industry in North America. Its location, three miles south of Titusville on Oil Creek, is inside our basin polygon.
The Drake well is not in our well count. It was plugged and abandoned in 1861, fifty years before any state required permits, and a hundred years before WPA mapping efforts went looking for the wells the industry had forgotten. It is one of an estimated 700,000 to a million unaccounted-for wells in Pennsylvania alone.
The dark-data problem in this basin is not a database issue. It is a memory issue. The wells were drilled before anyone built a system that could remember them.
Pennsylvania flagged its first Marcellus unconventional well in 2005. Ground broke at scale in 2008. By 2014 the state was producing more dry gas than any province in North America that was not the Williston, including more than the entire Gulf of Mexico shelf. 23,237 of those wells are inside our polygon, almost all in northeastern PA's Bradford-Tioga-Susquehanna trend.
Every one of them is lit. The PA DEP indexes them. The operators file digital completion reports because their next pad's design depends on the previous pad's logs. The Marcellus is the model of how a modern petroleum system should keep its records.
25,112 PA wells total are lit, including a few thousand non-horizontal modern conventionals. The other 229,158 PA wells in this cut are dark.
The Utica horizontal era in Ohio began with Chesapeake's first wells in 2011. By 2014 Carroll, Harrison, Belmont, and Monroe counties were the lit edge of the Ohio half of the basin. Northeast of that arc, Cleveland and Youngstown sit on top of pre-1950 vertical Devonian and Berea sandstone wells. South of the arc, Cincinnati and the Hocking Valley are the same.
The Utica wedge is smaller than the Marcellus wedge, started later, and covers a thinner section of the play. Ohio's lit count is 4,670: less than a fifth of Pennsylvania's. The state's 97.8% dark share is the result.
The OH layer ships horizontal-leg geometries on a separate sub-layer (Layer 6 of the public service). Those traces are not on this map; we plot only the surface holes. The lit dots are the SH locations of horizontal wells.
81% of the basin has unknown spud-vintage. WV and OH ship no spud-date column on their public well files; both states default the entire population to bucket 5. The vintage signal we have is almost entirely Pennsylvania.
PA's pre-1980 mountain (38,937 wells, mostly historic WPA-mapped) plus its 2010+ Marcellus bar (25,394 wells) is the legible chart. Everything else is regulatory dark, not record-keeping dark.
Pennsylvania's top operators are split between Marcellus modern (Range Resources, EQT, Chesapeake, Coterra) and conventional legacy plug-and-abandon (PennGrade Energy, Snyder Brothers). Marcellus operators carry single-digit dark shares. Conventional names average 95%+.
West Virginia's "operators" are dominated by "OPERATOR UNKNOWN" placeholders, plus modern Marcellus operators (Antero, Southwestern, EQT, Diversified) all flagged 100% dark by the architectural rule.
Ohio's top names are a mix of Utica-era horizontal operators (Encino, Eclipse, Ascent) and legacy independents (Artex, Petromax). The Utica names carry sub-30% dark shares, the legacy names 95%+.
"OPERATOR UNKNOWN" carries roughly 14,000 WV wells. That field is the cleanest single category of dark inventory in the entire series so far.
The Williston was 46% lit because its drilling history was dominated by one fifteen-year horizontal era. The Anadarko was 91% dark because its horizontal era was a thin top on sixty years of vertical legacy. The Appalachian Basin is 95% dark because both factors compound: 165 years of legacy, plus three regulators with three different filing protocols, plus a dozen pre-regulatory decades when no one filed anything at all.
The dark-data problem is not solved by waiting for technology. It is not solved by waiting for the regulators to catch up. It is solved by going back, finding the wells, and digitizing them. Every basin in this series tells you the same thing in a different accent. Appalachia tells you in the loudest one.
Five basins in, the curve has hardened. Kansas 94%, Permian 78%, Williston 46%, Anadarko 91%, Appalachia 95%. Williston is the only basin where horizontals dominate the well count, and it is the only basin in the lit majority. Every other basin in the series carries a vertical legacy that the modern era does not have time, money, or political will to digitize. The Marcellus is the most successful unconventional play in modern North America, and inside its own basin polygon it is a thin top layer on a quarter-million wells the regulators inherited from the WPA, the K Sheet, and Edwin Drake.
Where this goes
I build the pipelines, basin-clip analyses, and AI-assisted subsurface QC that close this kind of gap, across basins, regulators, and the geological detail that still matters. If you work any side of this, let's talk.
Sources: PASDA for the PA DEP Conv/Unconv shapefile and the PA Historic WPA wells. WV DEP Office of Oil and Gas for the 2016 well location file and the 2024 Q4 H6A horizontal production roll. OH DNR DOG_Services Oilgas_Wells_public REST endpoint, paginated 1,000 records per request. EIA Tight Oil + Shale Plays Lower 48 Dec 2021, Basin=Appalachian filter. Wells joined in EPSG:4269 (NAD83). Built with deck.gl and scrollama. Analysis: @salamituns.