Three criteria to evaluate if a package or library is suitable for processing very large datasets.
Implement core functions in a language that is efficient for computation
C, C++, FORTRAN, Rust are rather good on this.
Python, R, and other high-level languages are not very good.
Support multithreading and even distributed computing
Relying on single core, single thread would not produce good performance.
It is necessary to support multi-threading, either on the same computer or a cluster
Enable efficient indexing
[Read More]
Computation on Windows Arm64 Computers
Microsoft and other manufacturers had been releasing more arm64-based laptops and tablets. However, it is still quite inconvenient to set those computers for scientific computation in Python, R, etc. Overall, it is not recommended to use an arm64 Windows computer for data science works. Here are a few possible solutions.
General Solution In general, we can use it as a Linux computer with WSL2. However, not all computational platforms/environments are available on Linux systems like Ubuntu.
[Read More]
Disable Global Protect (PAN) Autostart
Global Protect from Palo Alto Networks (PAN) is a widely used VPN client. However, it is quite annoying that the program will automatically start even when we just occasionally use the VPN to connect to remote computers. Here are a few clues to disable the autostart of Global Protect on Windows and Mac, respectively.
Windows 11 Computers It seems PAN plays the game quite well. It is not easy to use the normal methods to disable the auto start.
[Read More]
Add SSL Certificate to IIS
Let’s Encrypt Let’s Encrypt is a free, automated, and open Certificate Authority (CA), run for the public’s benefit. We give people the digital certificates they need in order to enable HTTPS (SSL/TLS) for websites, for free, in the most user-friendly way we can.
Let’s Encrypt Client ACME Clients help create certificates for the sites and get them validated through Let’s Encrypt.
Win-ACME win-acme is a good ACME client for Windows Server IIS.
[Read More]
Develop Websites
Develop Websites using R
With RStudio and Quarto, there are quite a few options to develop simple but neat websites. From the RStudio, New Project, we can create Quarto Website, Bookdown, blogdown, and Simple R Markdown websites. Particularly for Quarto Website, all the interactive capabilities allowed by OJS could be incorporated.
The Quarto Website is based on a bootstrap template and has a simplistic appearance.
Importantly, Quarto or R markdown can directly hold DIV, which can be used by many JavaScript libraries to plot charts, maps, or tables.
[Read More]
Speeding up Python
Python, as an interpreted scripting language, is not characterized by speed or executive efficiency. A lot of utility tools like the package management tools, when written in Python, is not fast. Many tools have been rewritten using C/C++. Here are a few clues in terms of getting Python faster in many different aspects. This will be an on-going effort and updates will be published when there is new information.
[Read More]
Use census API in R and Python
Access Census data in Python
Since Census publicized its data access API to developers, many Python pacakges had been developed to take advantage of that. Currently, there is a very long list of Python packages that allow use to access census data via scripts.
census (work best with package us)
censusdis (census discovery)
cenpy
census-data-downloader (for ACS data)
pygris (for tiger boundary files)
Access Census data in R
There are also many packages in R that facilitate census data accsess.
[Read More]
Computational Environment on Mac OS
Set up Python on Mac OS
Mamba forge package
The best solution so far is to use MiniForge release. The Mamba Forge combination is the most efficient one. Download and run the Shell (.sh) installation file in Mac terminal. Use mamba init zsh to initialize Python for the terminal. zsh is the default shell for Mac OS. If a different shell is used, just change the parameter in mamba init.
[Read More]
node.js management
Managing node.js Today most JavaScript libraries are developed, tested, and deployed or distributed via node.js framework. Many libraries stopped providing the regular js file for web developments. Instead, they are using modules, similar to Python. For legacy systems, a separate JS file is still needed and it is necessary to use node.js and npm to get the js file. This might be the only way to get the lasted version for some JS libraries.
[Read More]
Shortest Path in Polygons
Finding the shortest path between two points within a (simple) polygon can be efficiently solved by using the funnel algorithm. For more details, see blog 1, blog 2, Paper 1 and Paper 2.
A good implementation in Python is available at GitHub margaeor