Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Installing Linux on a Wal-Mart OS-less PC
» DomAPI for cross browser UI building

Splitting text at word boundaries

Here's a small Python function I wrote this morning to split text at word boundaries. If you want to store...
Here’s a small Python function I wrote this morning to split text at word boundaries. If you want to store text in a database as multiple records and you have fixed column length (ie, databases that don’t support LOBs), this might be handy.
def reverse(alist):
    """reverse a list"""
    temp = alist[:]
    temp.reverse()
    return temp

def split_content(content,slice_length=50):
    """returns a list of strings after splitting
    the content at a maximum of slice_length or
    at the last word boundary before that

    if a slice is anyway more than slice_length, it’ll
    cut it at slice_length"""

    out = []
    i = 0
    s = 0
    word_boundary = re.compile(r'(\s)',re.DOTALL|re.IGNORECASE|re.M)
    print content, slice_length
    while 1:
        # get the next slice
        i = i + 1
        t = content[s:s+slice_length]
        if t == '': break
        # check the end of slice for word boundary
        # we can assume that the last space from end
        # is the word boundary
        m = word_boundary.search(''.join(reverse(list(t))))
        if m is not None:
            t = t[:len(t)-m.start()]
        s = s + len(t)
        out.append(t)
        print t, len(t)
    return out
//-->