I am curious to know how you test a deck when you just create a new one.

What are your criteria to state it works or not?

mine: I try it one time with all lands. If it wins most of then then it gets in my library otherwise it is dumped. However I am sure this is a very rough way to test it.

Thank you for this thread -- it will be helpful to me simply to systemize my thoughts in order to express them.  Hopefully I will pick up ideas from other responders as well.

First, it is a little tricky to use the AI to simulate performance vs. human opponents, but I have several reasons for preferring to test vs. the AI.
1.  It fits my schedule.  It is rare I can complete a full game vs. a human without distractions / interruptions.
2.  It protects secrets if I should choose to spring a deck in a tournament.  For me, this is probably not a big deal since, unlike Valentino, I am not the kind of threat others would likely specially prepare to play.  But it is still nice to spring an occasional surprise.  I like the look on people's faces (well, OK, the look I imagine on people's faces) when they finally realize my eclectic collection of magic immune silver hawks, lava imps, and the astral armor I cast on my flame spiders was not just to thwart their meteor spells, but to prepare a catastrophe.
3.  It is (arguably until the choose an opposing deck option comes out) more reproducible than playing a human.
4.  It offers more variety of opposing decks than most human opponents.  Most humans don't have the 500 (or whatever the current count is) of different decks that the AI has.

So here is my approach:  Ideally, (I don't always have time to complete the test -- or I may decide the deck needs modification/scrapping before I finish) I play against each region on the map (remember each faction comes with 3 regions).  I may surrender and restart a given match if the opposing deck, in my opinion, is not a meaningful test for my deck.  This can be caused by power cards like mass delusion (which reverses strength and health of every minion played) or other AI cheats in deck construction (e.g. 30 wolf packs in a single deck).  I will not necessarily stop with any power cards (some actually make the game more meaningful) and the power cards that trigger a reset my depend upon the deck I'm testing.  For example, I generally tolerate the dragon blood (inflict one damage on every minion that strikes for life damage), but I will reject if my deck revolves around low health minions.  I may also reject a deck if a power card gives me an advantage -- e.g. mage academy (which restores one power after every spell cast) when my deck relies heavily upon spells.  During these tests, I observe the following:
1.  win-loss record.
2.  comfort level.  Were my wins decisive or narrow.  If the wins were narrow, were they secure (e.g. I lost 15 health because my opponents weaker but faster minions got a jump on me, but things stabilized once I got minions deployed on every lane), were they swingy (I reduce my opponent to zero health the turn before his minions would kill me), were they lucky (my opponent and I each have 2 health, he has 4 unblocked big minions, but on my turn I draw a heat seeker), or were they due to AI deficiency (the AI chooses to invoke his unopposed giant volta's special to kill my undead triton adjacent rather than attacking to kill me).  Were my losses solid showings ( I just couldn't overcome my opponent's champion 8/8 bone dragon, but I could keep it blocked and deal with my opponent's other cards for a good span of time), or were they pathetic (everything I played was bested in my opponent's next turn).
3.  Play observations.
     A. Did the deck theme play out effectively?
     B. Did the cards I played tend to out-match my opponent's cards (card or power-point advantage)?
     C. Was I frequently wasting power points when I needed to be making plays?
     D. Was I frequently wanting something (more cards, a minion, a particular aura)?
     E. Were there situations that the deck handled poorly?  Vulnerabilities that became apparent?
     F. Were there disappointing cards (cards I never wanted to play,
         cards insufficient for their intended tasks, etc.)?
     G. Was the deck very vulnerable to chance (e.g. card order)?
     H. Was the deck fun to play?
4.  Reflections:
     A. How might the deck play differently vs. human opponents?
     B. What is likely to really cause the deck problems?
     C. What cards, if any, are crucial?  Which are exchangeable?
     D. Is the deck too predictable?  too narrow?
     E. Is the deck truly original or merely a rehash of someone else's deck?