Packages Good for Large Dataset Processing

Three criteria to evaluate if a package or library is suitable for processing very large datasets. Implement core functions in a language that is efficient for computation C, C++, FORTRAN, Rust are rather good on this. Python, R, and other high-level languages are not very good. Support multithreading and even distributed computing Relying on single core, single thread would not produce good performance. It is necessary to support multi-threading, either on the same computer or a cluster Enable efficient indexing [Read More]

Use census API in R and Python

Access Census data in Python Since Census publicized its data access API to developers, many Python pacakges had been developed to take advantage of that. Currently, there is a very long list of Python packages that allow use to access census data via scripts. census (work best with package us) censusdis (census discovery) cenpy census-data-downloader (for ACS data) pygris (for tiger boundary files) Access Census data in R There are also many packages in R that facilitate census data accsess. [Read More]

Computational Environment on Mac OS

Set up Python on Mac OS Mamba forge package The best solution so far is to use MiniForge release. The Mamba Forge combination is the most efficient one. Download and run the Shell (.sh) installation file in Mac terminal. Use mamba init zsh to initialize Python for the terminal. zsh is the default shell for Mac OS. If a different shell is used, just change the parameter in mamba init. [Read More]

Regularly Report Computer IP with DHCP

As many are working from home during/after the pandemic, it is super convenient and even necessary to remotely connected to office computers or home computers. However, wirelessly connected devices or ISP-connected devices may have their IP updated from time to time. This blog describes a short Python program that can automatically and periodically report a computer’s IP address. The information is saved into a text file, which can be saved in Dropbox or other cloud drives. [Read More]