Install and configure JupyterHub in an Amazon EC2 instance

DRAFT: so far jupyterhub works only through http. I’ll add steps on how to use https, and some other tweaks to the configuration. First of all let’s create an Amazon Instance. For this tutorial I set up a t2.micro instance based on Ubuntu 18.04 64 bit. I would like to access jupyterhub from remote locations using https. During the creation of the instance I have modified the Security Group to accept traffic through port 443 (the default for https) from all remote locations. To make tests without encryption I also opened port 8080. I advice you open it before the … Read more

Install Tesseract 3.0.5 in Ubuntu 16.04

I recently got involved in a project requiring the use of an OCR (Optical Character Recognition) to extract text from images. After a bit of research, we decided to use Google’s Tesseract.

In particular we decided to go for version 3.0.5 due to the possibility to save the output in a nicely formatted tsv file containing, among other things, information on the blocks of texts appearing in the image and the location of the bounding boxes from which text is extracted.

Ubuntu 16.04 repositories contain version 3.0.4 of Tesseract; installing version 3.0.5 was not hard but it required a bit of reading from various sources (the Tesseract’s wiki and Leptonica’s documentation) and a bit of fiddling around to put files in the right
system locations, which is why I decided to collect all the steps I followed in this blog post.

Read moreInstall Tesseract 3.0.5 in Ubuntu 16.04