For this final project, the goal is to use python to perform web scraping and collect data that would generate meaningful visualizations. I chose to explore the topic of Traditional Chinese Medicine (CTM). I wanted to learn what is the most commonly used herb in all formulas that I could find online. I decided to use a website called “Sacred Lotus” and gets all the formula and herbs data from there.

The easiest way to understand what a CTM formula is considering it as a prescription. A formula contains multiple herbs with various quantity. A CTM doctor would prescribe a formula based on the patient’s unique condition. There are also formulas that are being passed down from generation to generation to treat common diseases based on CTM’s categorization. Each formula would contain different types of herb and it is believed that the combination of these herbs would work together to treat certain conditions.

It would be interesting to see what are the most commonly used herbs in all formula that I could find on the Sacred Lotus website. First, I collected the URLs of all formulas, then I gathered the herb names and required quantity of each herb from each formula’s page. For the formulas that require a range of quantity for a certain herb, I decided to go with the max amount. For example, a formula might call for 9-12 gram of licorice root, I took the max amount 12 gram for that herb in the formula to create the final visualization

I used Tableau Public to create the final visualization. The first view is a Treemap with name label and the second view is a bubble map with the size representing the total amount required by all formulas.

Please find the visualization here.