Scrape data on MtG decks.
This is a hobby project.
It started as a card data scraping from MTG Goldfish
. Then, some JumpIn! packets info scraping
was added. Then, there was some play with Limited data from 17lands when
I thought I had to bear with utter boringness of that format (before the dawn of Golden Packs on
Arena) [This part has been deprecated and moved to archive package]. Then, I discovered I
don't need to scrape anything because Scryfall.
Then, I quit (Arena).
Now, the main focus is deck
and yt
packages (parsing data on youtubers' decks from
YT videos descriptions).
- Scryfall data management via downloading bulk data with scrython and wrapping it in convenient abstractions
- Scraping YouTube channels for decklist-featuring video descriptions (or author's comments) - using no less than four Python libraries to avoid bothering with Google APIs:
- Parsing those descriptions (or author's comments) for decks:
- Pasted text decklists in Arena/MTGO format are parsed into Deck objects
- Links to decklist sites are scraped into Deck objects. 44
8000
sites are supported so far:
- 17Lands
- Aetherhub
- Archidekt
- CardBoard Live
- Cardhoarder
- Cardsrealm
- ChannelFireball
- Deckbox
- Deckstats
- Draftsim
- EDHREC
- Flexslot
- Goldfish
- Hareruya
- LigaMagic (with caveats)
- MagicVille
- ManaBox
- ManaStack
- Manatraders
- Melee.gg
- Moxfield
- MTGArena.Pro
- MTGAZone
- MTGCircle
- MTGDecks.net
- MTGJSON
- MTGMeta.io (defunct, scraped via Wayback Machine)
- MTGSearch.it
- MTGStocks
- MTGOTraders
- MTGTop8
- MTGVault
- PauperMTG
- PennyDreadfulMagic
- PlayingMTG
- Scryfall
- StarCityGames
- Streamdecker
- TappedOut
- TCDecks
- TCGPlayer
- TCGRocks
- TopDecked
- Untapped
- 3 more decklist sites in plans
- Both Aetherhub decklist types featured in YT videos are supported: regular deck and write-up deck
- Both Archidekt decklist types featured in YT videos are supported: regular deck and snapshot deck
- Both EDHREC decklist types featured in YT videos are supported: preview deck and average deck
- Both MTGCircle decklist types featured in YT videos are supported: video deck and regular deck
- All Untapped decklist types featured in YT videos are supported: regular, profile and meta deck
- Both old TCGPlayer site and TCGPlayer Infinite are supported
- Both international and native Hareruya sites are supported
- LigaMagic is the only sore spot that demands from me investing in scraping APIs to bypass their CloudFlare protection and be fully supported (anyway, the logic to scrape them is already in place)
- All those mentioned above work even if they are behind shortener links and need unshortening first
- Sites that need it are scraped using Selenium
- Link trees posted in descriptions/comments are expanded
- Links to pastebin-like services (like Amazonian does) , Patreon posts and Google Docs documents are expanded too and further parsed for decks
- If nothing is found in the video's description, then the author's comments are parsed
- Deck's name and format are derived (from a video's title, description and keywords) if not readily available
- Foreign cards and other that cannot be found in the downloaded Scryfall bulk data are looked up with queries to the Scryfall API
- Individual decklist URLs/HTML tags/JSON data are extracted from container pages and further processed for decks.
These include:
- Aetherhub users, events and articles
- Archidekt folders and users
- Cardsrealm profiles, folders, tournaments and articles
- CardKingdom articles and authors
- Cardmarket articles
- ChannelFireball players, articles and authors
- Commander's Herald articles and authors
- CoolStuffInc articles and authors
- CyclesGaming articles
- Deckbox users and events
- Deckstats users
- Draftsim articles and authors
- EDHREC authors, articles and article searches
- EDHTop16 tournaments and commanders
- Flexslot users
- Goldfish tournaments, players and articles
- Hareruya events and players
- LigaMagic events (with caveats)
- MagicVille events and users
- ManaStack users
- Manatraders users
- Magic.gg events
- MagicBlogs.de articles
- Melee.gg profiles and tournaments
- Moxfield bookmarks, users and search results
- MTGAZone articles and authors
- MTGCircle articles
- MTGDecks.net tournaments and articles
- MTGMeta.io articles and tournaments (defunct, scraped via Wayback Machine)
- MTGO events
- MTGRocks articles and authors
- MTGStocks articles
- MTGTop8 events
- MTGVault users
- Pauperwave articles
- PennyDreadfulMagic competitions and users
- PlayingMTG articles and tournaments
- StarCityGames events, players, articles and author's decks databases
- Streamdecker users
- TappedOut users, folders, and user folders
- TCDecks events
- TCGPlayer (old-site) players
- TCGPlayer Infinite players, authors, author searches, author deck panes, events and articles
- TopDeck.gg brackets and profiles
- Untapped profiles
- WotC (official MTG site) articles
- 88 container pages in total with 27 more in plans
- Assessing the meta:
- Goldfish
- MGTAZone
- (others in plans)
- Exporting decks into XMage .dck format, Forge MTG .dck format or Arena decklist saved into a .txt file - with autogenerated, descriptive names based on scraped deck's metadata
- Importing back into a Deck from those formats
- Export/import to other formats in plans
- Dumping decks, YT videos and channels to .json
- Semi-automatic discovery of new channels
- I compiled a list of almost 2.3k YT channels that feature decks in their descriptions and successfully scraped them (at least 25 videos deep) so this data only waits to be creatively used now!
No | Format | Count | Percentage |
---|---|---|---|
1 | commander | 61996 | 38.78 % |
2 | standard | 35176 | 22.00 % |
3 | modern | 17018 | 10.64 % |
4 | pauper | 9682 | 6.06 % |
5 | pioneer | 9178 | 5.74 % |
6 | legacy | 4642 | 2.90 % |
7 | brawl | 3725 | 2.33 % |
8 | historic | 2732 | 1.71 % |
9 | explorer | 2438 | 1.52 % |
10 | undefined | 2387 | 1.49 % |
11 | paupercommander | 2049 | 1.28 % |
12 | duel | 2025 | 1.27 % |
13 | timeless | 1791 | 1.12 % |
14 | premodern | 1067 | 0.67 % |
15 | irregular | 1058 | 0.66 % |
16 | vintage | 916 | 0.57 % |
17 | alchemy | 856 | 0.54 % |
18 | oathbreaker | 353 | 0.22 % |
19 | penny | 300 | 0.19 % |
20 | standardbrawl | 281 | 0.18 % |
21 | gladiator | 105 | 0.07 % |
22 | oldschool | 56 | 0.04 % |
23 | future | 36 | 0.02 % |
24 | predh | 7 | 0.00 % |
TOTAL | 159874 | 100.00 % |
No | Source | Count | Percentage |
---|---|---|---|
1 | moxfield.com | 73244 | 45.81 % |
2 | mtgo.com | 15297 | 9.57 % |
3 | arena.decklist | 14549 | 9.10 % |
4 | aetherhub.com | 10553 | 6.60 % |
5 | mtggoldfish.com | 10434 | 6.53 % |
6 | archidekt.com | 5734 | 3.59 % |
7 | mtgdecks.net | 4437 | 2.78 % |
8 | melee.gg | 3423 | 2.14 % |
9 | mtg.cardsrealm.com | 3088 | 1.93 % |
10 | mtga.untapped.gg | 3055 | 1.91 % |
11 | tcgplayer.com | 2756 | 1.72 % |
12 | tappedout.net | 1694 | 1.06 % |
13 | streamdecker.com | 1615 | 1.01 % |
14 | mtgtop8.com | 1514 | 0.95 % |
15 | magic.gg | 1493 | 0.93 % |
16 | mtgcircle.com | 1160 | 0.73 % |
17 | mtgazone.com | 715 | 0.45 % |
18 | deckstats.net | 585 | 0.37 % |
19 | hareruyamtg.com | 485 | 0.30 % |
20 | starcitygames.com | 485 | 0.30 % |
21 | flexslot.gg | 419 | 0.26 % |
22 | magic.wizards.com | 378 | 0.24 % |
23 | scryfall.com | 289 | 0.18 % |
24 | pennydreadfulmagic.com | 281 | 0.18 % |
25 | pauperwave.com | 269 | 0.17 % |
26 | cardmarket.com | 264 | 0.17 % |
27 | magic-ville.com | 204 | 0.13 % |
28 | topdecked.com | 187 | 0.12 % |
29 | channelfireball.com | 171 | 0.11 % |
30 | manabox.app | 169 | 0.11 % |
31 | edhrec.com | 160 | 0.10 % |
32 | paupermtg.com | 117 | 0.07 % |
33 | coolstuffinc.com | 108 | 0.07 % |
34 | manatraders.com | 87 | 0.05 % |
35 | mtgsearch.it | 72 | 0.05 % |
36 | tcdecks.net | 55 | 0.03 % |
37 | mtgstocks.com | 53 | 0.03 % |
38 | manastack.com | 40 | 0.03 % |
39 | mtgmeta.io | 38 | 0.02 % |
40 | cyclesgaming.com | 34 | 0.02 % |
41 | commandersherald.com | 32 | 0.02 % |
42 | deckbox.org | 32 | 0.02 % |
43 | cardhoarder.com | 25 | 0.02 % |
44 | mtgvault.com | 25 | 0.02 % |
45 | app.cardboard.live | 19 | 0.01 % |
46 | 17lands.com | 11 | 0.01 % |
47 | draftsim.com | 10 | 0.01 % |
48 | magicblogs.de | 4 | 0.00 % |
49 | mtgarena.pro | 3 | 0.00 % |
50 | mtgotraders.com | 1 | 0.00 % |
51 | playingmtg.com | 1 | 0.00 % |
TOTAL | 159874 | 100.00 % |