Introduction to the Notebook¶
Note
We are utilizing Jupyter Books [Community, 2020] for our development. We did not have any part in the development of Jupyter Books (but we sure do thank them), but our contribution is the content contained in this notebook.
Welcome to our introduction and application of latent dirichlet allocation or LDA [Blei et al., 2003]. Our hope with this notebook is to discuss LDA in such a way as to make it approachable as a machine learning technique. From “when to use LDA” to “applying LDA to talk about bias,” we tried our best to cover the topic in an approachable manner. If we are missing anything, feel free to click on the button at the top-right side of the page.
Chapters¶
Notebook Introduction - Provides details on how to run this Jupyter Notebook in Binder, Google Colab, or even in the browser itself.
Latent Dirichlet Allocation (LDA) - Introduces the topic modeling and LDA. Including an example of its application using Python
Dirichlet Distribution - We provide a look at the Dirichlet Distribution using The Chinese Restaurant Process to illistrate how it is derived and used in LDA.
Jigsaw - an Implementation of LDA - We wanted to provide a use-case for LDA, so we coupled LDA and Unintended Bias (a dataset from Kaggle)
Visualizing and Anayzing Jigsaw - Finally, we take the results from LDA + Jigsaw and provide visualization and analysis of the findings.
References¶
I know it is tradition to have references at the end of books, but when you are standing on the shoulders of giants. You thank them first.
- Ald85
David J Aldous. Exchangeability and related topics. In École d'Été de Probabilités de Saint-Flour XIII—1983, pages 1–198. Springer, 1985.
- Bau00
L. Frank Baum. The Wonderful Wizard of Oz. George M. Hill Company, 1900. URL: https://www.gutenberg.org/files/55/55-h/55-h.htm.
- bla17
blacksite. 2017. Accessed on 2021-08-06T18:24:07Z. URL: https://stackoverflow.com/questions/44208501/getting-topic-word-distribution-from-lda-in-scikit-learn.
- Ble17
David Blei. Prof. David Blei - Probabilistic Topic Models and User Behavior. YouTube, Feb 2017. URL: https://www.youtube.com/watch?v=FkckgwMHP2s&t=484s.
- Ble12
David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012.
- BNJ03
David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.
- Bog17
Aja Bogdanoff. Saying goodbye to civil comments. Dec 2017. URL: https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d.
- BOH11
Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. D³ data-driven documents. IEEE transactions on visualization and computer graphics, 17(12):2301–2309, 2011.
- BLB+13
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122. 2013.
- Com20
Executable Books Community. Jupyter book. Feb 2020. doi:10.5281/zenodo.4539666.
- HUnitaryteam20
Laura Hanu and Unitary team. Detoxify. Github. https://github.com/unitaryai/detoxify, 2020.
- JG19
Jigsaw and Google. Jigsaw unintended bias in toxicity classification. 2019. URL: https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification.
- Liu20
Bing Liu. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020.
- Pas21
Panupong Pasupat. Dp: chinese restaurant process viewpoint. Jan 2021. URL: https://ppasupat.github.io//a9online/bayesian-nonparametrics/023-chinese-view.html.
- Ric19
Leonard Richardson. 2019. URL: https://beautiful-soup-4.readthedocs.io/en/latest/.
- TMN+14
Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, and Ming Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In International Conference on Machine Learning, 190–198. PMLR, 2014.
- Wer16
Wayne Werner. https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook, 2016. Accessed on 2021-08-04T13:42:07Z.