Introduction to the Notebook

Note

We are utilizing Jupyter Books [Community, 2020] for our development. We did not have any part in the development of Jupyter Books (but we sure do thank them), but our contribution is the content contained in this notebook.

Welcome to our introduction and application of latent dirichlet allocation or LDA [Blei et al., 2003]. Our hope with this notebook is to discuss LDA in such a way as to make it approachable as a machine learning technique. From “when to use LDA” to “applying LDA to talk about bias,” we tried our best to cover the topic in an approachable manner. If we are missing anything, feel free to click on the GitHub Logo button at the top-right side of the page.

Chapters

Notebook Introduction - Provides details on how to run this Jupyter Notebook in Binder, Google Colab, or even in the browser itself.

Latent Dirichlet Allocation (LDA) - Introduces the topic modeling and LDA. Including an example of its application using Python

Dirichlet Distribution - We provide a look at the Dirichlet Distribution using The Chinese Restaurant Process to illistrate how it is derived and used in LDA.

Jigsaw - an Implementation of LDA - We wanted to provide a use-case for LDA, so we coupled LDA and Unintended Bias (a dataset from Kaggle)

Visualizing and Anayzing Jigsaw - Finally, we take the results from LDA + Jigsaw and provide visualization and analysis of the findings.

References

I know it is tradition to have references at the end of books, but when you are standing on the shoulders of giants. You thank them first.

Ald85

David J Aldous. Exchangeability and related topics. In École d'Été de Probabilités de Saint-Flour XIII—1983, pages 1–198. Springer, 1985.

Bau00

L. Frank Baum. The Wonderful Wizard of Oz. George M. Hill Company, 1900. URL: https://www.gutenberg.org/files/55/55-h/55-h.htm.

bla17

blacksite. 2017. Accessed on 2021-08-06T18:24:07Z. URL: https://stackoverflow.com/questions/44208501/getting-topic-word-distribution-from-lda-in-scikit-learn.

Ble17

David Blei. Prof. David Blei - Probabilistic Topic Models and User Behavior. YouTube, Feb 2017. URL: https://www.youtube.com/watch?v=FkckgwMHP2s&t=484s.

Ble12

David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012.

BNJ03

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.

Bog17

Aja Bogdanoff. Saying goodbye to civil comments. Dec 2017. URL: https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d.

BOH11

Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. D³ data-driven documents. IEEE transactions on visualization and computer graphics, 17(12):2301–2309, 2011.

BLB+13

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122. 2013.

Com20

Executable Books Community. Jupyter book. Feb 2020. doi:10.5281/zenodo.4539666.

HUnitaryteam20

Laura Hanu and Unitary team. Detoxify. Github. https://github.com/unitaryai/detoxify, 2020.

JG19

Jigsaw and Google. Jigsaw unintended bias in toxicity classification. 2019. URL: https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification.

Liu20

Bing Liu. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020.

Pas21

Panupong Pasupat. Dp: chinese restaurant process viewpoint. Jan 2021. URL: https://ppasupat.github.io//a9online/bayesian-nonparametrics/023-chinese-view.html.

Ric19

Leonard Richardson. 2019. URL: https://beautiful-soup-4.readthedocs.io/en/latest/.

TMN+14

Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, and Ming Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In International Conference on Machine Learning, 190–198. PMLR, 2014.

Wer16

Wayne Werner. https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook, 2016. Accessed on 2021-08-04T13:42:07Z.