Tips for large volume data mining

LVGamb00ler · #1 10-08-2006, 03:06 AM

Finally... the interminable waits for PT to load hands are over (for me at least)!

I use PT, PAHUD and SixthSense (at least until SpadeEye improves sufficiently).

I used to build these massive PostgreSQL databases and it would take forever to load new hands into them. Now I don't have any large DB's nor do I have a large number of DB's, even though I do collect a large number of observed hands. Now I can easily use data from millions of observed hands.

Here is what I now do:

1) use PAHUD's cache DB but without automatic updates
2) use SixthSense's PTAgg.exe to create PTAgg DB
3) every week I backup my PT DB of observed hands and then delete all the hands

To prevent PAHUD from re-building the cache, you MUST NOT add/remove any of the DB's that are used to construct it's cache AND I've read that you also should not change the filters (not sure about this part). What I do is continually re-use the same DB, but remove all the observed hands every week. I use a little script to do that (see PTCLEAR script at end of post). Of course if you wish to be able to recover from some mis-step or a hardware/software malfunction, you must BACKUP your DB before using PTCLEAR. I would also STRONGLY recommend that you make backups of the PTAgg and PAHUD caches when you do your weekly backup of the observed hands.

Why do it this way? Continually using the same DB makes my (and hopefully, your) life a lot easier than managing several DB's or building unwieldy monster DB's. It works well because both PAHUD and SixthSense keep track of the highest game number that they have already processed. When their cache update programs run, they will process only those hands with a higher game number. Using the same DB with continously advancing game numbers makes both these cache update methods run smoothly and efficiently. The PTCLEAR script preserves the highest game and session numbers and also the player name table.

This is for ring games only. I have no knowledge of how to do this for tournament data or if it would even be reasonable to do so. Also, there are quite a few details missing, but I will try to fill them in later in response to the flood of questions this post will likely generate! LOL

One thing I have left out is how I got millions of hands into the PAHUD cache from several DB's... I'm going to charge $100 for that gem! LOL just kidding! It wasn't that difficult... but I'll leave it for a later post because I'm tired right now.

TEST PTCLEAR ON A TEST DATABASE BEFORE USING ON REAL DATA!!

Hints:
Make an empty DB, convert it to PostgreSQL, then insert a few hands, back it up so you can test again after screwing up the first few trys, learn how to use pgAdmin, find the PostgreSQL docs., learn at least a little SQL, etc, etc, etc....

Enjoy!
LVGamb00ler

PTCLEAR script - run using psql.exe or from within pgAdmin.

<font class="small">Code:</font><hr /><pre>
create temp table save_nos (
game_id int4,
session_id int4 ) without OIDS;
alter table save_nos owner to postgres;

insert into save_nos (game_id) (select game_id from game order by game_id desc limit 1);
update save_nos set session_id = (select session_id from session order by session_id desc limit 1);
truncate game_players;
truncate session;
truncate player_winnings;
truncate hand_histories;
truncate game;
insert into game (game_id) (select game_id from save_nos);
insert into session (session_id) (select session_id from save_nos);
vacuum full game_players;
vacuum full session;
vacuum full player_winnings;
vacuum full hand_histories;
vacuum full game;
</pre><hr />

AEKDBet · #2 10-09-2006, 02:57 AM

nice post, ty

MKR · #3 10-09-2006, 02:03 PM

[ QUOTE ]
use PAHUD's cache DB

[/ QUOTE ]

Where is PAHUD's cache DB located and what is it called?

Thanks

MKR

PokerAce · #4 10-09-2006, 04:30 PM

Here is the part of the tutorial that covers the database cache:

http://pokeracesoftware.com/hud/tuto...e=cfgprefdbase