Busy, Shmizzy

Busy, Shmizzy

Blog

MyLife, MyRamblings, Tech, Code

by Maria on 21 Oct 2014 - 06:43  

So, last post, I had been told by someone at a user group that I could not become a great programmer working by myself. I really love my job, so I set out to find a way to do exactly that.

As I thought about my predicament, I thought, sheesh there must be hundreds of people just like me at the university in exactly the same pickle, all of us working mostly by ourselves in research labs all over campus, and probably a good percentage of us self-taught. I started poking around the UW website, and was surprised to find no sort of network of developers. So, I started one. In May of this year, I began tying to figure out how to track down fellow developers at the UW, and it turns out this is no easy task. But, as of October there are 87 subscribers, so I'm making progress. If you know any software developers at the UW, please send them to this site:

https://mailman1.u.washington.edu/mailman/listinfo/research_lab_devs

to subscribe to my mailing list.

We have started having regular meetings as well. It has been a lot of fun. We have been looking at code, and talking about research and software development. I started my list at an opportune time, because others were also feeling there was a void. There is now an organization at the UW called eScience, and they are very interested in improving coding practices in science at the UW. When they found out about our group, they volunteered to help out. Currently they help with organization and bring snacks to our meetings, total win! Additionally, as a community we are receiving many awesome opportunities. For example, in November, I and many others on the list will be attending a Software Carpentry Instructors training.

https://benmarwick.github.io/2014-11-12-training/

Which I am really looking forward to. Science and coding, why not do both well? Plus, we get to do this:

09:15: Teaching as a performance art (2)

So we can share the love.

I have been looking for ways for our group to meet on a regular basis to do some live coding, and I am contemplating starting a coding series of sorts. My current idea is that I'd like to take this book:

Head First Design Patterns

By Eric Freeman, Elisabeth Robson, Bert Bates, Kathy Sierra

and go through it as a group. Each time we meet we would talk about one or more patterns, and talk about how it translates into the various languages that people in the group know, and hopefully do some group or pair coding and share it.

So, if you have tried something similar, I'd love to hear how it went! Or if you have ideas of other things that have worked with your group, I'd like to hear that too. Finally, we are always looking for speakers that have experience in the juncture of code and science, especially with incorporating best practices, so feel free to drop me a line if you want to help or come talk!

In addition to this group I have formed, I have just started TAing for an Introductory Python Course, because there is no better way to really learn material, then to teach it!

I don't know if I am becoming a great programmer, but I am learning a lot. Maybe not as quickly as if I were working daily with other developers, but I get to keep my cool job, and still learn more about best practices and about coding from other developers, so I'm pretty sure this is the appropriate response:

Busy, Shmizzy ~ Comments: 0

Add Comment



Unit Testing with Panda3D

Blog

Code, Python, Panda3D

by Maria on 28 Jan 2014 - 01:40  

I've been playing around quite a bit with Panda3D lately. Panda3D is a game engine originally created by the Disney Corp.; it's a framework for 3D rendering and game development for Python and C++ programs. When I was first learning Panda3D, it was not obvious to me how to do unit testing, and I didn't see any documentation about it, so I thought I would share what I have learned. I think there are a few keys, which I will go over briefly, and then share the code.

1. Don't render to screen, but don't use window-type 'none' either. The one bit of advice I could find about unit testing with Panda3D, suggested using window-type none, but if you do this your camera node becomes noneType, so you will probably have to make changes to your code. Instead use window-type offscreen.

2. Use a setUpClass to instantiate your game. Panda3d has a lot of overhead that you don't want to reload every time you run a test.

3. Have a setUp method to re-configure your game to its starting state. Reload any configuration files here, and move players and objects to their starting positions. This way you you are always starting in the same state without having to re-instantiate your game. This does mean that you should separate setting up the game scaffolding from the initial configuration in your game file, but this is a good idea anyway.

4. The only real change you should need to make to your code is including an if __name__ == "__main__": conditional at the end. This will allow you to instantiate the game and start the task manager running automatically if you start the game from the command line, but won't if you import the class as a module.

5. You can step the task manager directly from the test code, allowing you to run the game for just as long or short as you need to, in order to test the logic for that particular test.

So that I don't have to worry about updating the website if I change the code, I'll point you directly to my GitHub repository for the code. Happy testing!

https://github.com/codedragon/panda3d_unittest_example

Unit Testing with Panda3D ~ Comments: 0

Add Comment



More Tries in Python

Blog

Code, Python, Tech

by Maria on 13 Sep 2013 - 06:40  

In the previous blog post we tried to figure out what the python code I found on Wikipedia that was supposedly returning an iterator over the items of the Trie was actually doing, and then we sort of ran with that to gain some understanding of Tries, generators, and recursion in Python. And that was great, but in the end we didn't really have anything useful. Our method returned all of the letters in our Trie, one at a time. It seems it would be more useful if it returned all of the words in our tree, so let's try doing that. Since we have a goal, let's create a test so we can tell if we have reached our goal. Here is our test code:

import unittest
from trie import Trie

class TestWords(unittest.TestCase):
    """Tests for function words"""

    def setUp(self):
        unittest.TestCase.setUp(self)

        self.mytrie = Trie()
        self.mytrie.add('ant',1)
        self.mytrie.add('ante',2)
        self.mytrie.add('antic',3)
        self.mytrie.add('antsy',4)
        self.mytrie.add('antse',5)
        self.mytrie.add('ban',6)
        self.mytrie.add('banana',7)

    def test_default_case(self):
        """Test words retrieves all words properly from Trie."""
        expected = ['ant','ante','antic','antsy','antse','ban','banana']
        actual = []
        for words in self.mytrie.words():
            actual.append(words)
        print 'actual', actual                                                                                          
        print 'expected', expected                                                                                        
        self.assertTrue(sorted(actual)==sorted(expected))

if __name__ == '__main__':
    unittest.main(exit=False)

I've made a fairly complicated, but relatively small, Trie, to really give our code a run for our money. And yes, I even made up a word. And, of course, our method fails miserably. So, let's see about getting some words instead of letters.

We know that many of our words have prefixes in common, that is the point of creating a tree like this. So, the general idea is going to be to collect letters and re-use them. To begin, let's just start collecting letters, and spitting out what we have when we reach the end of a word. We can tell if we are at the end of a word by whether there is a node.value.

    def words(self, prefix = []):
        """Return an iterator over the items (words) of the 'Trie'."""
        word = False
        for char, node in self.root.iteritems():
            prefix.append(char)
            if node.value:
                yield ''.join(prefix)
            for i in node.words():
                yield i
 

And, our test fails!

actual ['ant', 'antic', 'anticsy', 'anticsye', 'anticsyee', 'anticsyeeban', 'anticsyeebanana']
expected ['ante', 'antic', 'ant', 'antsy', 'antse', 'banana', 'ban']
F
======================================================================
FAIL: test_default_case (__main__.TestWords)
Test words retrieves all words properly from Trie.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_trie.py", line 27, in test_default_case
    self.assertTrue(sorted(actual)==sorted(expected))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

I have printed out the expected and actual list of words, and we can see immediately what the problem is. We need to figure out how to delete the letters we no longer need. How do we decide when we should delete a letter? Let's take a look at our Trie:

It looks like we will need to delete letters when we come to the end of a word, but only if that letter is at the end of a branch. How will we know? And how many letters will we delete? When we get to the end of the word antsy, we will need to delete just the y, but when we get to the end of the word antse, we will need to delete both the e and the s. (We actually have no way of knowing which branch will be traversed first; since the tree is based on a dictionary, which node it traverses first is random.)

What would be really useful at this point is if we knew at each branch how many words share that prefix. We could certainly figure this out by traversing the tree once, and then using the information when we traverse the tree again to get the words. But, it seems like this might actually be useful information, in and of itself. So, maybe this is something that really should already be available. This is the advantage with playing around with your proposed data structure before production use. We can still go back and add functionality. So, let's put some new code in our add function to keep track of the number of words using a given prefix. The first two methods of our class now look like this:

class Trie:
    def __init__(self):
        self.root = defaultdict(Trie)
        self.value = None
        self.count = 0

    def add(self, s, value):
        """Add the string `s` to the                                                                          
        `Trie` and map it to the given value. Additionally, keep track of
        how many words each node is shared with.
        """


        head, tail = s[0], s[1:]
        cur_node = self.root[head]
        cur_node.count += 1
        if not tail:
            cur_node.value = value
            return  # No further recursion                                                                    
        self.root[head].add(tail, value)

Very simple addition, which is going to make our life much easier. We may also want to create a method that returns the number of words in our tree that share a particular prefix we may be interested in, but for now, let's finish our iterator.

For every letter that we find, we now know how many words use that letter, so if we keep track of how many words we find that are using each letter, we can compare the two. When we reach the end of a word that is also the end of a limb, we need to delete at least to the last branching point, but possibly farther. We traverse A-N-T-E and we have hit an endpoint. The E only has one word associated with it, so we can delete it, but the T has 5 words associated with it, so we need to check if we have finished 5 words yet. We can't just keep one running tally of words finished, because sometimes we will be keeping track of multiple branching points. For example, we need 5 words for the T, but only 2 for the S, but can be working on completing both of those branches at the same time. Clearly, we need to keep a list of words finished for each letter. We start at the A, we have finished no words, [0], continue to the N, [0, 0], and now to the T, [0, 0, 0], but now we have finished a word, so we go back and add 1 to everything [1, 1, 1]. Let's go to the E, we have [1, 1, 1, 0], and once again we have finished a word, so we now have [2, 2, 2, 1]. We are now at the end of a limb, so we delete letters.

But, wait, how did we know we are at the end of a limb? Well, we know we are at the end of a word, since we can check to see if there is a value associated with it. And if we check the node.count at this point, we can see that this letter is associated with just one word, so we must be at the end of a limb. Great, we can delete letters. How many? We have to check. We delete the E, then check the T. The T has 5 words associated with it, but we have only 2 recorded for the T, so we can't delete it yet. We are done deleting, as no earlier letters can possibly be ready for deleting yet. Now we go to the I, but wait, our letters-visited-list is [2, 2, 2, 1]. Hmm, looks like we need to remember to delete the 1 at the same time that we delete the letter. Makes sense, as this list should correspond to our prefix list. We are at [2, 2, 2, 0] now. We go to the C, and we are at the end of another word, so the array becomes [3, 3, 3, 1, 1]. We check our node.count, and it is 1, so we are at the end of a limb. Delete letters at the end of our array until we get back to the T, which is still a 5, and we are only at 3. Great, seem to have the hang of this. But, we have now checked the T twice, and you may remember that we are not actually going to have access to that node when we backing out. You can tell this is true, because when we were getting the letters, and not deleting any of them, we did not see multiple T's. The final word was 'anticsyeebanana', so the algorithm was just tacking on more endings, and not re-visiting previous letters.

What would make more sense anyway is to keep an array of the nodes we have visited, and what their corresponding word counts should be. So, that list would now look like [5, 5, 5, 1, 1]. Now we just compare the arrays and when numbers are equal, get rid of those letters. Well, we still must only check when we are at the end of a limb. But, this seems a little over-complicated too. What if we just collected the word count array from the nodes, and subtracted 1 from the whole array whenever we hit the end of a word? Any letters currently in the array would be ones that this word would be a part of, since it is just a representation of the letters in the prefix array. Any zeros would represent the letters (and positions in the word count array) we need to delete. Let's try it.

    def words(self, path = [], prefix = []):
        """Return an iterator over the words of the 'Trie'."""
        for char, node in self.root.iteritems():
            prefix.append(char)
            path.append(node.count)

            # if there is a node.value, then we are at the end of a word and
            # we should yield the word and subtract 1 (word!) from the
            #  entire node path
            if node.value:
                yield ''.join(prefix)        
                for x,y in enumerate(path):
                        path[x] = y - 1

            # if we are at the end of a word and the count is 1,
            # we are at the end of a branch, and it is time to
            # delete letters back to the last branch we haven't
            # gone down yet.
            if node.value and node.count == 1:
                    for j in range(path.count(0)):
                        del path[-1]
                        del prefix[-1]

            for i in node.words():
                yield i

And, let's run our test.

actual ['ant', 'antic', 'antsy', 'antse', 'ante', 'ban', 'banana']
expected ['ante', 'antic', 'ant', 'antsy', 'antse', 'banana', 'ban']
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Left the print statements so you could see the words. So, at this point, we should create a bunch more tests to check more edge cases.

Do you suppose this is what the author was trying to do with the code in the Wikipedia article? Is there any way to do this without including the node.count or checking the tree twice?

You can find my code on GitHub.

Finally, here is a nice post about how Tries are useful for word storage for computer versions of Boggle, without ever mentioning the word Boggle.

More Tries in Python ~ Comments: 0

Add Comment



Python: Recursion, Generators and Tries

Blog

Code, Python, Tech

by Maria on 05 Sep 2013 - 04:25  

I was learning about Tries, and writing some code to do fun stuff with them, and along the way I ran into some code on Wikipedia that seems faulty, and also came up with a silly way to explain recursion. So, I thought I'd share.

First the silly explanation of recursion:

Let's say you have this wooden sign, but somehow the letters were put on the sign backwards, so your 'welcome' sign says 'emoclew'. So, you put it in your magic box. The box has boxes inside boxes, each with one door leading to the next inner box, kind of like a russian doll with doors. The magic part is that the inner boxes and doors only appear when you feed in a piece of wood with at least one letter on it. The sign goes through the first door, and the first letter (e) is cut off and set aside, and the rest of the sign continues to go through doors. Each time a letter is cut off, set by the door, and another door magically appears, until there is just one letter on your piece of wood. At this point no more doors appear, and the wood is sent back through the door(s) it came in. Each time it goes out of a door, the letter from that box is put back on the piece of wood, but now to the other side of the piece of wood. So as it goes back through the doors it looks like this: 'w' -> 'we' -> 'wel' -> 'welc', etc., until it pops out of the last door, and you have your sign the way you want it. Or, you could just repaint he sign. ;-)

This is pretty much how recursion works. The important bit is that the data is processed both on the way in and on the way out, and each step does exactly the same thing. This is how that code looks:

def reverse(s):
    if s == "":
        return s
    else:
        return reverse(s[1:]) + s[0]

reverse('emoclew')

The first time it goes in, there is a string there, so it goes to the else statement and runs reverse('moclew') + 'e'. The 'e' just hangs around by the door waiting for the return to be completed, while the 'moclew' goes through the next door. Same thing with the next run, the 'm' hangs out by the door, while the 'oclew' goes through the next door. Once we get to the last letter, no new door appears, instead our 'w' gets kicked out. This is the 'return s' line. But, we still have to go back out the doors we went through, so at the next door, we hit our return reverse(s[1:]) + s[0]. s[1:] at this point we have 'w' ('reverse(s[1:]) from the last door plus the 'e' (s[0]) we left hanging out, so we add them together. Going back through the next door, the reverse(s[1:]) is now 'we' and we add the 'l', continuing until we once more have our welcome sign, letters correctly written.

Now things get weird (and very useful) when we get multiple inner magic boxes. Let's say instead of our single word, we have a word tree. This time we aren't reversing the letters, just collecting them to find out what all words we have in our tree.

   B
   A
  C R
 K   E

Wouldn't it be great, if we could find all the words at once? This is basically what recursion can do. We send our tree in to our magic box. It collects the first letter, sends the tree through the next door, gets the second letter, sends the rest of the tree through the next door, but now finds two letters where it normally finds one. So, open two doors, and send each remaining tree through its own door, simultaneously! You can see how this should be faster than a for loop that would go through each branch one at a time. Recursion is most useful when you have data in tree format, such as directory structures, classification hierarchies, organization charts, etc. Or, if you are trying to do a flood fill. For an awesome explanation of the flood fill algorithm (with cats and zombies!) see the invent with python blog. There are, of course, times when recursion is not the best solution, such as calculating the nth Fibonacci number.

Great, now we are ready to look at Tries and the actual code to do this, and along the way introduce generators.

I was trying to understand Tries (aka Prefix Trees), and started playing around with the Python code on the Wikipedia entry for Tries. (You should probably go ahead and look at the whole article if you aren't sure what Tries are.) One of the last methods for Trie class in that code is items, which claims it returns an iterator.

def items(self):
    """Return an iterator over the items of the `Trie`."""
    for char, node in self.root.iteritems():
        if node.value is None:
            yield node.items
        else:
            yield node

So first let's see if we can figure out what this code is doing. Notice the yield statements. The yield statements indicate this is a generator. A yield statement is similar to a return statement, in that it sends the contents of whatever it is yielding to the calling function, but there is an important difference. As the name implies, it is only temporarily handing control over to the calling function. The generator is an object capable of creating and returning its members one at a time, because it is able to remember where it is when control is returned to it. It's purpose in life is iteration. If you have no idea what an iterable is, there is a great post on iterables here. This all may seem a bit fuzzy, but I hope by diving into this code some more, it will become clear.

So, this code seems to be creating a generator that spits out the node items of our tree. That sounds right. We know from looking at the other methods in our class Trie, that node.value is None until we come to an end of a word. So, maybe it spits out letters until it gets to the end of the word, and then moves to the next node? How would that happen? Hmm, this isn't sounding quite right...

And, when I started playing around with this method, I found it did not return what I expected or wanted. I wrote a little code snippet to test the items method. To start with I just added 2 entirely different words to my Trie. Here is a graphical representation of my tree:

And here is my little script:

$ from trie import Trie
$ mytrie = Trie()
$ mytrie.add('terror',1)
$ mytrie.add('plant',2)
$ for letters in mytrie.items():
$    print 'letters', letters 

I thought this would give me back all of the letters in my tree, but instead I got the following:

letters <bound method Trie.items of <wiki_trie.Trie instance at 0x1101921b8>>
letters <bound method Trie.items of <wiki_trie.Trie instance at 0x11018eef0>>

Certainly not the letters I expected. So, what is going on here? Well, the first problem is that we seem to be yielding the wrong thing, if we want letters. It seems to be yielding the method itself. Well, yes, that is exactly what it is doing. What do you even do with that? Some sort of twisted recursion? Hmm, well no surprise that the char in the iterator is the letter, so let's try yielding that instead. But it turns out, that if we yield char, we only get a letter 't' and letter 'p'. Why aren't we getting all of our tree?

So, our tree is made of up to 26 nodes at each level (assuming we are just allowing the lower case alphabet). In our example, we are using 2 nodes at the top level, 't' for 'terror' and 'p' for 'plant'. So, it looks like our code is iterating through those two nodes and stopping, and now we have to figure out how to get it to iterate through each of the child nodes as well. So, how do we do that? Recursion! Remember how I mentioned that recursion is useful for tree traversal? So, what happens if we iterate over node.items()*? Let's change our method, and for now, let's just print what we get:

* Note the parentheses that were missing in the original code on Wikipedia - we are now calling the function, not yielding the name. And no, yielding the call would not have been any more useful than yielding the name, or at least I could not find a use for it...

1    def items(self):
2        """Return an iterator over the items of the `Trie`."""
3        for char, node in self.root.iteritems():
4            # node.value is none when not at end of word
5            if node.value is None:
6                yield char
7                for i in node.items():
8                    print 'i',i
9            else:
10                yield char

And, rerunning my little test script, I get:

$ python test_trie.py
letters p
i l
i a
i n
i t
letters t
i e
i r
i r
i o
i r

Aha! So, now we yield this, and we are in business. A quick note about yield vs. return. With yield, you are not waiting until the function exits to return your variable. Using the box analogy, you are not waiting for the block of wood to go through all its boxes and come back again. You are yeilding your variable as soon as you are done with initial computations and hit yield, but remembering where you are, because you do expect to see the block of wood again. This means that, just like above where the i prints right away, you will get your letters one at a time as they are found.

We change our code again to this:

1 def items(self):
2      """Return an iterator over the items of the `Trie`."""
3      for char, node in self.root.iteritems():
4          # node.value is none when not at end of word
5          if node.value is None:
6              yield char
7               for i in node.items():
8                   yield i
9          else:                                                                              
10                yield char

And we get our letters!

letters p
letters l
letters a
letters n
letters t
letters t
letters e
letters r
letters r
letters o
letters r

Note that Python 3.3 added the syntax yield from X, as proposed in PEP 380, which simplifies our inner loop. With it you can do this instead:

yield from node.items()

But, what is going on in this code? It was straight-forward when we had the print statement, but now we have recursion in a generator. We had to make it a yield statement, because we can't have a return statement in a generator, but that does make things weird. So, now what is actually happening here?

Let's step through our loop. In our test code, we enter our loop and call our method to get the first letter. Node.value is none, so it yields char, and yields control back to our test loop. Our test loop prints 'letters' and the letter yielded. Now we go to the next 'letters in mytrie.items()' in the test loop again. This is the point where if we hadn't have added that inner loop, we would just continue to loop through that top h-node. Instead, something happens that may be a little unexpected. When we return to our method, we return to the same place we left; we do not go back to the beginning again, and this time this has consequences. This is something that is often glossed over when generators and yield statements are introduced. It is not just the variables that are preserved, but the position in the function as well. In the box analogy, perhaps when you return to the previous room (function call), instead of magically appearing at the exit door, you have to continue down a hallway to get to the exit. And maybe you have to pull a lever or something else before you leave.

When we left our items method, we had just yielded char on line 6, so now ask if there is a node.value (yes), and we enter our 'for i in node.items()' loop (I'll just call it the i-node loop). But, wait, node.items() is calling our method again (we hit recursion!), so now we do go back to the beginning of our method, and we continue until we hit our next yield statement, yield char again. Now yield char is an 'l', since we have iterated forward one, while still in our method. But, this yield is to the recursive call, so now control is returned to our i-node loop. Our i-node loop just has another yield, so now we yield again to the test loop, and print our 'l'. Okay, let's do another loop. We make our call from the test loop, and go back to our function, but where? The last yield statement was in the i-loop, so that is where we return, but that loop is now completed, so we go back to the next to last place we yielded, right after the yield char. That is followed by the i-node loop, where we finally get to create a new generator, and advance to the next letter. Every time through, we will create a new generator, and have to back out of that many more yields. While this is happening, Python is returning to the stacks it created in previous calls to create the current stack. This may sound inefficient, but that is because we are only going down one branch at the moment. As soon as we have to back partway out, and go down a new branch, it becomes more obvious why it is good to keep the old stacks around, backing out of yield calls to figure out which stack is relevant.

It is worth noting, that if this were ordinary recursion, it would not have worked. The first loop in our function is yielding char. If it were returning char, the inner loop could not advance, because it needs to know both what the last i was, and what node.items is in order to advance. In a normal recursive call, it would not have access to this information, since these variables are not being returned.

In our example of recursion with the magic boxes, we left a variable by the door, and picked it up on our way back out. The only reason we were able to do this is because that variable was in the return statement. In a normal function, once you leave the function (even to go into the next call of the same function), you lose any other information from that function, except what is in the return statement. But, in a yield statement, you return to the same place in the function, with all of the information you had when you were in the function before. And, if there is code after the yield statement, it will run that code, as if it had never left the function.

To see that the code continues after a yield, you could put the if statement:

 if node.value:
    print "end of word", node.value

after the i-node loop. It will print the end of word, as if it is finding the end of the word as it exits that node on its way back up the tree, as shown here:

maria@lamia:~/python/test$ python test_trie.py
('letters', 'a')
('letters', 'n')
('letters', 't')
('letters', 'i')
('letters', 'c')
end of word 2
end of word 1

for words ant (1) and antic (2)

Switch the order, and now suddenly the world makes sense...

$ python test_trie.py
('letters', 'a')
('letters', 'n')
('letters', 't')
end of word 1
('letters', 'i')
('letters', 'c')
end of word 2

Let's think about our data a minute. I think the graphic on the Wikipedia site is a bit misleading. I think there should be lines drawn horizontally between nodes in the same branch, because it is possible, and necessary, to travel between nodes at the same level in the same branch. Like so:

When we ask for the "next item" using the items method in the case of no recursion, it gives us the next node at that level. But if we are inside our method, and ask for the next node (recursion), we are essentially in a node and asking for the next node of that particular branch. So, in the drawing above, if you are in the "A" node, and ask for the next node, then you get "N" as that is the next node under "A". But if you are not in the "A" node, then the next node is "B". When you get down to the "T" node, you will get an "E", "I" or "S" when you use recursion. But, if you are yielded the "I" and ask for the next letter without recursion, you will get the "S". This will happen when you are moving from one branch to the next, because the method will back out of all of the recursive calls, and then go to the next item, which will be the head of the next branch. There it will hit recursion again, and descend down the next branch. Note that our code is not going down branches simultaneously at this point. This is because we do not have a call to split the recursion to go down the separate branches at once. When it hits a branch, it just goes down the first one it finds. We can consider this in a future post.

Okay, that's quite a bit for one post. Next post, let's try to expand our items function to be more useful.

Many thanks to the Seattle Python Interest Group for helping me sort this all out!

Python: Recursion, Generators and Tries ~ Comments: 0

Add Comment



Python and Mac OS

Blog

Tech, Code, Python, Mac

by Maria on 17 Jun 2013 - 23:51  

Mac comes with Python, but it is slightly out of date, and Apple has made changes that may become problematic. We could just install Python directly using the Mac install, but installing additional packages becomes problematic, because this is generally done on the command line, and the Python available from the command line is the system Python, not the one you installed in the Applications folder. Additionally, when you are creating a new project, it is helpful to know what additional packages are necessary for that particular project. Homebrew and Virtual Environments gives you a nice solution to these problems.

We will use Homebrew to install Python and some other utilities. You can even install both Python 2 and Python 3. Python will refer to Python 2, and python3 will start Python 3. Homebrew is a package manager, and will not only install stuff, but keep track of what is installed, and allow us to upgrade and uninstall easily.

Then we will use virtualenvwrapper to set up our project. Virtualenvwrapper sets up a virtual environment that allows you to decide what additional module you want to use for your project. For stuff already installed, it actually creates a link, rather than re-installing the same modules over and over. So, it isn't going to fill your hard drive, but lets you isolate what you are using for a particular project.

To get started, open a terminal and run this to install Homebrew:

$ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

If you don't have your path set yet for /usr/local/bin (probably you do), do that now by adding the following to your path (.bashrc) and restarting your shell.

export PATH=/usr/local/bin:$PATH

Python actually comes with its own package manager, which allows you to install additional Python packages. You will still use Homebrew for other packages that you may want. Don't worry too much about using the two package managers. Homebrew knows about pip and its installations and they play nicely together. If a package doesn't install with pip, try brew, maybe it wasn't really a Python package. For more information about Homebrew and Python, see here.

Now we want to install Python and some other packages. Here is a list of packages that I generally find useful. It is a good idea to always update brew before installing or updating other packages

$brew update
$brew install python
$brew install nose # necessary for testing
$brew install gfortran # necessary for numpy/scipy
$brew install numpy
$brew install scipy
$brew install pillow # this will install the Python Image Library PIL
$brew install wxmac # very nice for making Python GUI applications that port to 
 Windows, Mac and LInux. [[http://www.wxwidgets.org/]]
$pip install ipython # interactive python - bash in python! As well as other good  
 stuff... [[http://pyvideo.org/video/640/ipython-python-at-your-fingertips]]
$pip install matplotlib # make graphs! 

Virtualenvwrapper is just virtualenv with a wrapper around it. The wrapper provides some nice utilities to use the virtualenv. Use pip to install both:

$pip install virtualenv
$pip install virtualenvwrapper

After installing, add to .bashrc:

export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Devel
source /usr/local/bin/virtualenvwrapper.sh

You will need to reload bash, easiest to exit your terminal and start a new one. Useful virtual environment commands:

$mkvirtualenv test#Creates a virtual envirornment for project test
$lsvirtualenv#See what virtual environments you have
$workon test#Enter a previously created virtual environment
$deactivate#Exit virtual environment (use system python again)

more at the virtualenvwwapper website.

Now when you want to work on a project you have created a virtual environment for, you just type workon and you will see a list of possible projects (or just workon test to work on project test).

So, now what? Well let's say we want to work on a Django project, we could do this:

$mkdir testproject
$cd testproject
$mkvirtualenv testproject
New python executable in testproject/bin/python2.7
Also creating executable in testproject/bin/python
Installing setuptools............done.
Installing pip...............done. 

We see that some things were automatically installed in our virtual environment, basically enough to install the stuff we need. Now while still in our virtual environment, let's install Django, and start our project.

$pip install django
$django-admin.py startproject testproject
$git init
$git add .
$git commit -a -m 'Initial commit of testproject' 
$deactivate

What's all that git stuff? Git is version control software. Contrary to popular belief, you do not need a Github account to use Git. More information about Git and Version Control can be found here.

Okay, now we have the very beginnings of our new Django project. Next time we want to work on it, we just do:

$workon myproject

and we are on our way! If we want to see what packages we are using with our project, just use lssitepackages, while in our virtual environment. You can use this to create your requirements file so that your project will be easy to use or develop on anther machine or platform.

Python and Mac OS ~ Comments: 0

Add Comment



PyCon 2013

Blog

Python, Code, MyRamblings, Tech

by maria on 29 Mar 2013 - 05:40  

I went to Pycon in Santa Clara this year, and really enjoyed it. I learned a lot, and made quite a few connections. First the unpleasantness, the Adria Richards debacle. Much has been written already, so I'll make this brief. Adria Richards tweeted a picture of two men who were making sexual jokes behind her during a talk at the conference. Whether or not Adria chose the 'best' course of action for pointing out inappropriate behavior at a tech conference is an open question, and quite frankly beside the point. She chose what she thought was the best tool at the time, and there is no way she could have predicted what followed. What followed was a massive onslaught of threats and insults that was completely beyond the pale and speaks miles about how much sexism exists in the tech community. The reaction of the tech community shows that this community can be a very uncomfortable and often downright hostile place for women, and when incidents like this happen, it makes me incredulous that some people still wonder why women leave the IT community. If you would like to read more, I recommend these articles:

If you want to be depressed about the general state of conditions for women in IT, check out geekfeminism.

And, now onto much better things. My favorite talk was 'The Naming of Ducks: Where Dynamic Types Meet Smart Conventions' by Brandon Rhodes. It was very informative, and done with humor and great slides. My biggest pet peeve about technical talks is the slides containing huge swaths of programs. Most of the room can't even read it all, and all of that code usually distracts from the speakers point, anyway. These were nice, small bits of code, stripped down to the bare essentials to make the point. His talk is up on the awesome pyvideo site:

http://pyvideo.org/video/1676/the-naming-of-ducks-where-dynamic-types-meet-sma

And you can see the slides here:

http://pycon.github.com/2013-slides/Naming%20Ducks%20by%20Brandon%20Rhodes/

Another favorite talk, which was just chock-full of useful tidbits was 'Transforming Code into Beautiful, Idiomatic Python' by Raymond Hettinger. Another engaging, humorous speaker. His talk can also be seen on the pyvideo site:

http://pyvideo.org/video/1780/transforming-code-into-beautiful-idiomatic-pytho

and his slides are also available:

https://speakerdeck.com/pyconslides/transforming-code-into-beautiful-idiomatic-python-by-raymond-hettinger-1

I also participated in a couple of days of sprints, and based on my experience, I have some unsolicited advice for anyone wanting to run a sprint. The purpose of a sprint is two-fold. The current software developers on the project want to get a piece of software out there, and the new software developers want to help. That is the basic. It is hoped that everyone will learn something and have some fun as well. So, to accomplish this, the current software developers should do some homework before the sprint. If you actually want the new software developers to be able to help you, you must be able to get them up and running as soon as possible. Here are the most important steps, as I see it:

  • make development environment for your project easy to set up
  • document how to set up the development environment
  • Follow your documentation and seriously spend time installing on new computers and/or wipe out your environment and re-install a couple of times. Simplify the procedure, make instructions clearer, re-iterate
  • create documentation on how group uses versioning and software used
  • list out some tasks that need to be done, rate tasks by complexity and size
  • have an example of a test(s) to ensure that the nothing has been broken by the new code

I repeat, the more time you spend making sure collaborators can hit the ground running, the more help they can give you. This not only helps for the sprint, but will make your project more welcoming to potential contributors in general.

PyCon 2013 ~ Comments: 0

Add Comment



Charlie Rose

Blog

Health, Code Tech, Science

by maria on 26 Mar 2013 - 22:01  

I was on TV! Well, not really. My boss, Michael Shadlen, was on the Charlie Rose Show with Eric Kandel of Columbia University, Walter Mischel of Columbia University, Daniel Kahneman of Princeton University, and Alan Alda, host of the upcoming PBS program, “Brains on Trial”. But he showed some movies I made, so that's cool. There is a bit more information about the movies I've made for him here. The Charlie Rose show he is on is called "Public Policy Implications of the New Science of Mind" and it is part of his Brain Series. The whole show is very good, and I encourage you to watch all of it, but if you want to see the part where Mike talks about our research and shows the movies, go to 37:20. I created these movies by importing the experimental data into ActionScript and coding a re-creation of the experiment with the eye position of the animal superimposed on the re-creation of what she/he was seeing on the screen during the task. The spike train was added to the video, both visually and audibly, so you could get an idea of what was going on in the brain at the time. Link to Charlie Rose Show

Charlie Rose ~ Comments: 0

Add Comment



  LinkedIn