before the moss covers the thoughts...

XSV Cookbook

Recipes I usually use with XSV - swiss army knife for field separated files

XSV is an extremely fast CLI utility to process csv/tsv/psv files. Extremely helpful to automate data extraction, format conversions, filters etc.

It comes with an extensive man page and website also has great documentation. Still, noting down some of the common ways I use it below.

Generate fake data to test performance

I can't really test for scale because we don't have enough data

Often when we develop, we end up testing on our local machines with very few records in the database. Performance testing, re-architecting your solution for scale etc needs lot of data already present in the db - concurrently adding or updating 1000 records to an empty table is very different from doing it when table already has few million records.

It is actually quite simple to generate fake data and load it into a database. To simulate extreme stress, you don't need to generate obscene loads - you can reduce your available capacity to small CPU/RAM and get the same effect with moderate loads.

Read below to see snippet code on how to do the first part - getting a database up (docker + mariadb), generating data (python), loading it (usql) and then changing database capacity (docker) to see what you need to tune.

1to10: doom-emacs and org-roam, gpg

Note your thoughts with least friction!

Already setup doom-emacs and started with org-mode? If not, read my previous article on 0to1: doom-emacs and org-mode.

Since then, I've also started using org-roam, which is an emacs implementation of roamresearch.com's note taking tool. Following are my notes on how to get this going. Also, how to encrypt part of your notes using gpg.

Generate docs - the unix way

Or connecting tools with make

As programmers, lot of us find it easier to just type content using text based formats like markdown. However, we need to often share these as pdf, Word etc to our colleagues.

Quite often, we may need to create several documents that have same content across the documents. A simple example is a copyright footer or even an intro paragraph about your work.

A short walkthrough using venerable tools like make, m4 and pandoc to get this done follows.

0to1: doom-emacs and org-mode

Heard about org-mode, but afraid of emacs? I was. Here is how to get going!

Are you like me?

  • Know VIM/VI enough
  • Heard about emacs and org-mode
  • Tried it and got turned off by so many commands
  • Read too much documentation
  • Spent hours configuring these :)
  • Thumb started aching with C and M multi-key combinations
  • Tried Spacemacs
  • Went back to VIM

If yes, this is how I crossed over to actually using org-mode productively to maintain my notes, projects and todo. doom-emacs is the framework I used. It is superfast, has sane keys and UX and configuration does not need you to read and practice emacs.

Following is a 0to1 guide that ideally should've been more interesting with screenshots, but keeping it as copy-pastable text!

Generate JSON from PostgreSQL

You can easily format your query output as JSON!

So, you have a postgresql production database with a slave instance being used for reports and ad-hoc queries. Every now and then you get the same kind of data requests, but for different customers, orders etc. Instead of sharing SQL templates to different users and getting them setup to query, it is very easy to make quick shell scripts that output JSON formatted data. Then, you can easily have your web developers build UI for search and display.

Read on to install local database using docker, fill it up with sample data, script psql, then enhance it to wrap with JSONification and finally hooking up script to be an API.

Weekwise Anomaly Visualization

Many metrics are easier to visualize as a weekly table. Add anomaly highlights to that as well!

Update May 28, 2020 - code using this is added to my covid 19 tracker repo with actual output

I find looking at daily numbers folded by week (ie. Mon to Sun in one row) for last 8 weeks a good way to look at the data. Especially if multiple tables are put one below the other for the same period, it is easy to identify patterns. Sample tables in a typical e-Commerce system could be orders, shipments, payment_failures etc.

Instead of looking at this manually, why not use ML to automatically highlight anomalies? I tried various things with hand coded models first, then ARIMA and finally settled on FBProphet library.

Read below to see how a simple system can be built with publicly available data. Bonus - if you have not used jq and xsv, you can see how cool those are too.

result

VS Code as Git GUI

Why search for that perfect git gui? Use VS Code!

Though I prefer VIM, I've been using Visual Studio Code also a lot for a year. It is quite fast, has a great python code/debug environment and has beautiful font rendering on linux. I also use git a lot and often look for different UIs to deal with it rather than remember all the commands. gitk, gitg, tig, git-cola etc are some of the things I've used before. However, VSCode supports a very useful environment right out of the box. Add a small extension git-graph, and I am all set.

Following is a quick screenshot tour that explains various features.

basics

Customize your Video on Conferencing

World is doing WFH and Video Conferencing. Let us make it a bit more private

For the last few weeks, whole world has been under various measures of lockdown. WFH (Working from home) and VC (Video Calls) have been the norm. One of the pet peeves I had with various VC tools used (Google Meet, Zoom etc) is that there is zero customization offered on choosing the output feed from your camera.

For most video calls, just face is enough - why bother with rest of the image? It also helps you in summer to be little more relaxed with informal clothing; plus cute moments like children running into video feed can be avoided.

This is what I got my feed to be - read below on how to set it up.

this should do it!

Leo + Sphinx = painless documentation

Awesome combination to make it easier to write documentation.

Writing documentation, be it for your user manual or for design, is a black and white task. Either you love it or you hate it!

If you love it, chances are that you prefer coding it in a distraction free environment with simple markup, rather than using Word processing tools. Here is how you can do it very easily.

Gmail : GUI for your backend!

Why build an app if you can use GMAIL to get data?

Often, we have a need to collect data from customers. Immediate thinking is to make an app change, push it up to store, get people to upgrade (doesn't happen!) and then pray. A little more savvy version of this is making a web app and then send the links with a uuid based customer identification.

Why not just ask users to email you the data? People know how to use email! Following snippet shows how easy it is to automate fetching the data and then sending to a script to process.

Calendar Heatmaps from Dataframes

GitHub popularized this form contrib heatmaps. Calmap makes it easy to make!

sample

Github contrib map is such a great visualization to see the activity over a year. There are several javascript versions of this that provide interactive visualizations on the data; but when you don't need interactivity and want to just visualize multiple data points over same time axis to see any trends, Calmap is a super simple library that can generate those.

Continue to see what we can make with our own data!

Update May 28, 2020 - code using this is added to my covid 19 tracker repo with actual output

Remind - it does what it's name says!

Just don't forget anything - use this wonderful little Unix tool that runs on plain text.

remind is a classic command line unix utility. A very simple collection of text files hold all your reminders in a readable format. Let us make a quick cron job using this to send daily notification emails.

Are we there yet?

Scaling time triggered checks without crons crawling all over!

What often starts as simple databases for simple solutions grow when business is growing. Business requirements also grow over time — after all, there is only so much that can be supported by frequent reports being monitored by people at regular intervals to alert others about possible situations.

SQLite3 CTE tricks for time series analysis

Hands on analyze git log data with SQLite3 CTEs!

SQL has been there for ages. sqlite3 gives you a phenomenal tool to quickly load and analyse data in a language meant for that. While I’ve used it for a long time, only recently did I know about support for CTE aka Common Table Expressions.