Thursday's Raspberry Pi meetup

I went to Raspberry Pi meetup on Thursday, January 14, and would like to share some links about the presentation I gave and things I brought with me.

  1. Instructions on setting up OpenCV on Raspbian Jessie –

  2. My fork of Thiago’s automatic doorman that works with PiCamera –

  3. Python library and documentation for PiCamera –

  4. OpenCV tutorials –

  5. Soil moisture sensor (or soil hydrometer) –

  6. Soil moisture sensor tutorial –

  7. Since then I found a much better tutorial here –

One important thing to be aware about soil moisture sensor, is to not apply voltage to the soil probe all the time or take too many measurements at once as leads on the probe will quickly corrode. Also the probe has analog data output, and it looks like it would work better with Adruino board than Raspberry Pi as the latter doesn’t have analog input pins.

The fun of premature optimization

Last week I deployed a redesigned website that included a new search feature, looking at the server logs I’ve noticed that users of the site use search. A lot.

I wanted to know more about what kind of questions end-users were asking, so I wrote a short program to parse the logs, extracting query strings formed with HTTP GET variable q=query+text. It was fairly easy do, especially with apache_log_parser (pypi, github), that works with any web server with common log format support. Nginx is compatible, default access log line can be parsed with this – %h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i'

The program returns an array records, each containing query string, remote IP address and a time stamp, then I toyed with the idea of measuring its performance, so I tested several possibilities with the most crude tool ever – time command. I used real time value.

The setup: this program is implemented in Python 3, it was run on Intel Celeron 2955U, the input log file size is 15MB. All the code changes were cumulative – the performance was measured with all the changes up to that point.

Source code from the initial version.

from pprint import pprint
import apache_log_parser
nginx_parser = apache_log_parser.make_parser(
    '%h %l %u %t "%r" %>s %O "%{Referer}i" \"%{User-Agent}i')
log_file = "access.log"

def parse_queries(in_query):
    """Return one or more quieries in the request stripping 'q' member"""
    for q in in_query:
        q = set(q)
        if 'q' in q:
            out = q - {'q'}
            yield out.pop()

def prep_item(label, parsed_line):
    """create a dictinary with label, timestamp and IP address members"""
    timestamp = parsed_line['time_received_tz_datetimeobj']
    remote_ip = parsed_line['remote_host']
    return dict(query=label, time=timestamp, ip=remote_ip)

def extract_searches(filename):
    """parse the logfile and return list of dictionaries with query, time ip"""
    search_list = []
    with open(filename, 'r') as log:
        for line in log:
            outp = nginx_parser(line)
            if 'request_url_path' not in outp:
            if '/search/' not in outp['request_url_path']:
            for item in parse_queries(outp['request_url_query_list']):
                search_list.append(prep_item(item, outp))
    return search_list

if __name__ == "__main__":
    searches = extract_searches(log_file)

Running this script under time gave the following results.

Run # time (seconds)
1 11.052
2 11.139
3 11.019
4 11.073
total average 11.0708

What would happen if the list accumulator in extract_searches were replaced by a generator called outside of the function by list comprehension? The code will look much cleaner, generators are a little slower than in-memory operations, but list comprehensions are much faster than array append, on average this should be the same or slightly faster.

def extract_searches(log_fd):
    for line in log_fd:
        outp = nginx_parser(line)
        if (not 'request_url_path' in outp) or \
                ('/search/' not in outp['request_url_path']):
        for item in parse_queries(outp['request_url_query_list']):
            yield prep_item(item, outp)


if __name__ == "__main__":
    with open(log_file, 'r') as logfile:
        searches = [x for x in extract_searches(logfile)]

As it turns out, on average this particular case of nicer python code is marginally slower.

Run # time (seconds)
1 10.991
2 11.219
3 11.431
4 11.141
total average 11.1955

Another possible optimization was to replace if 'q' in q: statement with try .. catch block as at that point in the code, this statement should mostly evaluate to True. After running the program without the conditional it turn out my guess was wrong and the statement always evaluated to True.

def parse_queries(in_query):
    for q in in_query:
        out = set(q)
        out -= {'q'}
        yield out.pop()

After this modification, the average runtime improved enough to make up for performance hit from the pythonic changes.

Run # time (seconds)
1 10.950
2 11.119
3 11.090
4 11.028
total average 11.0468

Finally, I’m only interested in urls containing ‘/search/’, what if the whole log line is searched for ‘/search’/ without attempting to find request URL string first?

def extract_searches(log_fd):
    for line in log_fd:
        if '/search/' not in line:
        outp = nginx_parser(line)
        if ('request_url_path' not in outp) or \
                ('/search/' not in outp['request_url_path']):
        for item in parse_queries(outp['request_url_query_list']):
            yield prep_item(item, outp)

The improvement was an order of magnitude. Turns out the common log format parser, is just a nicely abstracted set of regular expressions, a rather neat neat set – line 141, but running several regular expression to parse a line is way slower than a simple string comparison.

Run # time (seconds)
1 1.025
2 1.022
3 1.026
4 1.040
total average 1.0285

I can think of two things after doing this exercise: first, the law of leaky abstractions is still with us even if the abstracted code is relatively straightforward and works well, there are still implementation details to be aware of; second, fixing up and making code better and more pythonic helps me to think about program flow differently with much greater potential benefits in the long run, even if in the short run performance of a program may decrease.

My awesome 15K MEC Run

These are my experiences from MEC Race Seven that happened on October 20, 2015. I posted this a while ago on EYTR page, now I finally got a couple of photos of it.

From start.

MEC Race Start

To finish.

MEC Race Finish

I’m still not sure how that’s possible but I ran MEC 15K in 1:09:42.0. That brought me to #11 finisher.

My watch shows the average pace of 4.39 Min/Km. Which is not the kind of numbers I’m used to or expect.

I started the run at a pace that was way too fast breathing like a steam locomotive for the first 2K. Then I followed a runner who was keeping a pace of 4:20-4:30 Min/Km and I managed to run along until 4.5K mark. Then I started falling behind, but I didn’t fall behind for more than a couple hundred meters.

From the beginning of the race I thought I would do my best and then completely fall apart around 10K, but that didn’t happen. After 8K my calves started to hurt, so accelerating or coming back to the average pace has become increasingly difficult, but not impossible.

I didn’t fall apart at 10K, but 10K point was also a turnaround point near the finish line that goes another 2.5K in the direction that is totally opposite where the majority of runners are going, and running that way felt kinda demotivating.

The stretch from 11K to 13K was the hardest in the reace, but somehow whenever I looked at the watch the dreaded pace indicator at the bottom still managed to hover on both sides of 5:00 min/KM mark.

The remainder of the race became a mental battle that turned from not falling apart before 10K to not falling apart at the next K and finally not falling apart at the last K. I’m glad to report I have managed.

I’ve been running around 25-40K in the last three or four weeks but never more than 15K at a time, I also would sometimes run during our regular runs with Becca or Navin at a pace 4:30-4:40 Min/Km but no more than 2 or 3KM at a time, apparently both of those things really helped.

Thanks guys and gals!

Raspberry Pi Wireless config

Here’s working wireless configuration I use to connect my Raspberry Pis with USB wifi dongles to my home network.

I’ve been using Realtek RTL8188CUS and Ralink RT5370 based devices rated for 150Mbps.

Dongles based on Ralink chip have been working flawlessly, but I have experienced two issues with the Realtek-based device. The adapter just doesn’t work in AP mode, and if there’s no network activity it will drop in power saving mode that can only be resumed packets are sent from the host.

In practical terms the host becomes unavailable for anyone who attempts to initiate a remote connection, however, this can be mitigated by creating a cron job that pings the router every minute.

This configuration that doesn’t rely on GUI or NetworkManager starting up, assumes that wpasupplicant package is installed and wireless adapter is recognized by the system as wlan0.

Add the following to /etc/network/interfaces

allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf
iface default inet dhcp

Edit /etc/wpa_supplicant/wpa_supplicant.conf as such.

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev

        ssid="AP Name"
        psk="AP Secret"

Hello world

Lately I’ve been writing way to many notes that are somewhat more valuable than to end up in the recycling.

This is more or-less boilerplate post that there’s actually going to be something out there when I manage to set up nginx to serve my blog and not just a sad, empty default blog content everywhere.

Jekyll turned out to be a little too complicated in the documentation department – the documentation pages dive into internal structure, Liquid – apparently a de-facto templating engine for Ruby on Rails. All I wanted to do is to write a new post which content will display in full. I wanted to learn bits of Ruby, it’s a really nice language, but I think that Jekyll might not be the best starting point.

Anyway, eventually watching way too easy, slow and boring explanations on youtube helped, though I still had to transform index.html into something like the following

<div class="post-list">
{% for post in site.posts %}
    <span class="post-meta">
        {{ | date: "%b %-d, %Y" }}

        href="{{ post.url | prepend: site.baseurl }}">
        {{ post.title }}
    <div class="post-content">
      {{ post.content }}
{% endfor %}

Also there’s no way I’m typing term endhighlight correctly on the first try.

subscribe via RSS