Privacy in today's age with a SOCKS proxy
Posted on August 7, 2019
(Last modified on March 7, 2021)
| 3 minutes
Say you are at a cafe, and you want to surf the Web. But the WiFi is
not secure. Or say your company lets you bring your laptop, but what
if its firewall has blocked your favorite website? Is there no hope,
besides paying $15 to a VPN provider?
There is, and it costs about $3.50 per month as of this writing.
[Read More]
The Spark tunable that gave me 8X speedup
Posted on August 7, 2019
(Last modified on March 7, 2021)
| 2 minutes
There are many configuration tunables in Spark. However, if you have
time for only one, set this one. It made a streaming application we
run process data 8X faster. That’s 800% improvement, no code change
needed!
[Read More]
Getting top-N elements in Spark
Posted on May 11, 2019
(Last modified on March 7, 2021)
| 2 minutes
The documentation for pyspark top()
function has this warning:
This method should only be used if the resulting array is expected
to be small, as all the data is loaded into the driver’s memory.
This piqued my interest: why would you need to bring all the data to
the driver, if all you need is a few top elements?
The answer is: it does not load all the data into the driver’s
memory.
[Read More]
Livy is out of memory
Posted on March 11, 2018
(Last modified on March 7, 2021)
| 3 minutes
Spark jobs were failing. All of them. The data pipeline had stopped. This is a tale of high-pressure debugging.
[Read More]
Accessing home computer from anywhere
Posted on November 26, 2017
(Last modified on March 7, 2021)
| 5 minutes
Do you sometimes want to access your home computer from an outside
network? Maybe you use another system, but you do not trust it and
would prefer your home computer for some workflows?
This post outlines the steps to make such access possible.
[Read More]
The program that would not go away
Posted on February 17, 2017
(Last modified on March 7, 2021)
| 6 minutes
This post is about a program hang. The hang was in the Python process
that was running Ansible scripts. The problem was hard to debug and
had me go back to Unix textbook.
[Read More]
Correct way to create a directory in Python
Posted on February 2, 2017
(Last modified on March 7, 2021)
| 2 minutes
Can you see the problem with this code? It comes from Ansible, v2.1.1.0.
if not os.path.exists(value):
os.makedirs(value, 0o700)
It’s quite straightforward. It checks if a directory path exists. If
it does not, then it creates the directory path, similar to mkdir -p
. What could be wrong?
[Read More]
Getting rid of unused virtual disks on XenServer
Posted on February 1, 2017
(Last modified on March 7, 2021)
| 4 minutes
A continuous test server I’d set up had stopped working. The
XenServer on which it was running had a 1TB disk: and it was full.
What’s going on?
[Read More]
Log rotation, no code change needed
Posted on January 24, 2017
(Last modified on March 7, 2021)
| 4 minutes
This post shows you how to rotate old logs from your application.
There is no change to application code. There is no specialized
logging library or framework needed. It works for any language, on
standard Unix platform.
[Read More]
Resetting a TCP connection and SO_LINGER
Posted on October 21, 2016
(Last modified on March 7, 2021)
| 12 minutes
Can you quickly close a TCP connection in your program by sending a
reset (“RST”) packet?
[Read More]