SPEOC-pt-1

Code repository for a Summer Project on the Economic Origins of the Constitution. Contact me at chrisliao (at) uchicago (dot) edu. The project has two components - delegate matching and pre-post debt matching. The README is separated into two parts, one for each component. Note that README.md and code repository/project are a work in progress.

READ THIS BEFORE PROCEEDING

Since we're continuing work on this repo in summer 2022, I've moved everything from summer and winter 2021 to the S2021 folder. This means that code importing data will probably break. To fix it, add a ../ before the relative directory path which should enable the data to be imported.

I may fix this in a future iteration if I decide to revisit old code. If I do so, this section will either be ~~struckthrough~~ or deleted. I might just end up using the readme.md primarily to show off the cool work our team did and less about the technicalities of each file.

Overview of delegate matching

The purpose of this part of the project was to match constitutional and state convention delegates to their pre-1790 debt asset totals to identify how much each delegate owned and ananlyze if this influenced their vote.

The name matching was executed in three steps: cleaning, fuzzy-matching, and merging. We have structured the Dropbox subdirectory to make the code runnable as-is. In more detail, the folder consists of:

Data: This folder contains all of the raw data and cleaned data. Within the Delegates, Post1790, and Pre1790 subfolders are the corresponding raw Excel sheets (copied from the dropbox) and cleaned csv files. This folder also contains final_matching.csv and final_matching_post1790.csv which list the results of our fuzzy matching.
Producables:The requested final tables are in this folder, along with a folder called ‘Separate’ containing csv files with individual tables for each type of certificate
1. Delegate_Pre1790_Assets.csv contains the assets in Pre-1790 certificates for all delegates
2. Delegate_Post1790_Assets.csv contains the assets in Post-1790 certificates for all delegates
clean_debt_files.ipynb: This Jupyter notebook contains the code used to produce the cleaned csv files in the Data/Pre1790/Cleaned folder. Note: the post-1790 data was not cleaned.
fuzzy_matching.ipynb: This Jupyter notebook contains the code that created thefinal_matching.csvfile in the Data folder that we used to identify pre-1790 debt certificates held by delegates
fuzzy_matching_post1790.ipynb: This Jupyter notebook contains the code that created thefinal_matching_post1790.csvfile in the Data folder that we used to identify post-1790 debt certificates held by delegates
debt_aggregation.ipynb: This Jupyter notebook contains the code which merges the cleaned delegate names and debt certificate/stock data using thefinal_matching.csvfiles. It then merges those results to produceDelegate_Pre1790_Assetsin theProducablesfolder.
debt_aggregation_post1790.ipynb: This Jupyter notebook contains the code which merges the cleaned delegate names and debt certificate/stock data using thefinal_matching_post1790.csvfiles. It then merges those results to produceDelegate_Post1790_Assets.csvin theProducablesfolder.
Results.ipynb: This jupyter notebook contains the code which we used to generate results described in the technical report and located in the Results section.

Detailed Description of Jupyter notebooks **

`clean_debt_files.ipynb`

Its overall purpose is to clean raw data imported into DataFrames from Excel sheets. Special cases are names that had to be manually replaced because they didn't follow some sort of set rule. Here are what the notebook does to different data files.

Delegates - strips names and separates alternate last names (in parentheses in the raw data) into two full names
- Special case - row 44 of constitutional_convention_1787 was ‘Fitzsimons (FitzSimons; Fitzsimmons)’ and got modified to Fitzsimons (Fitzsimmons)
- Note: John Nesbitt (Nisbet) was missed in cleaning and might not have matched properly
Loan office certificates - removes empty rows and entries missing first or last names, splits full name entries with ‘&’ or ‘and’ into multiple names, handles cases of executors, heirs, ‘and co’, and treasurers
- Createsstate_companies.csvwith entries that have ‘&’ or ‘and’ but with incomplete names (i.e. only last name)
- Note: uses an entity classifier to detect organizations and remove them; results were manually checked for accuracy
  - See https://spacy.io/models for more detail about the entity classifier
- Special cases - seeexport_weird_names.csvfile in the Data folder; see chart below for cases changed manually
  - Entries marked with ** have an executor whose first name got accidentally dropped
  - Notes column contains other changes, errors, etc.
  - See Documentation/LOC_special_cases.md
Marine office certificates - removes entries with ‘deceased’ and ‘estate’, adding notes to a cleaning notes column
- Special cases: see chart, Hoov and Harrison were split into two last name columns
- See Documentation/marine_special_cases.md
Pierce certificates - strips entries and removes entries with no last name, added a second set of name columns
- Special cases - see chart; some split into two names
- See Documentation/pierce_special_cases.md
State liquidated debt - removes empty rows; splits entries with ‘&’ or ‘and’ into two names; removes phrases like ‘and co’, ‘& son’ , ‘& others’ and adds them to a Cleaning notes column; removes ‘estates’, ‘heir of’, and ‘deceased’, adding those to the notes
- Special cases - see chart; some entries in PA had ‘esastate’ and those were treated the same as ‘estate’
  - See Documentation/sld_special_cases.md

fuzzy_matching.ipynb

Its overall purpose is to run the cleaned delegate and debt certificate files through a fuzzy matching algorithm, yielding `final_matching.csv_

First, uses process.extract with a threshold of 85 to form a list of possible matches
- Note: process.extract uses a weighted ratio of all four fuzz ratios
Next, let m = min(# of words in name 1, # of words in name 2). This is the minimum threshold of word-pairs between the two names that must have a threshold score of 90 (a word pair is a pair of words consisting of one word from each of the two names that were considered to be a match)
If the match passes both steps, it is added to the csv file
In addition, if one name from the debt certificate files is matched to multiple delegates, we eliminate all but the matchign with the highest score.
See the Methods section of the report for a better explanation of step 2

fuzzy_matching_post1790.ipynb

It uses the same algorithm asfuzzy_matching.ipynb to generatefinal_matching_post1790.csv

debt_aggregation.ipynb

It merges state and constitutional convention delegates’ on the matched names fromfinal_matching.csv, then uses full names to merge with the cleaned pre-1790 loan certificate data to compile the total sum value and certificate count of each certificate type for each delegate. It also producesDelegate_Pre1790_Assets.csv

debt_aggregation_post1790.ipynb

It uses the same process to merge matched names with the post-1790 stocks and compile the total sum value and count of each type of stock for each delegate. It producesDelegates_Post1790_Assets.csv

Results.ipynb

This is the code which utilizes fuzzy matching and aggregated debt csv files to generate the tables and charts found in the Results section of our technical report

Detailed description of csv files

Raw data

Delegates
- Data/Delegates/constitutional_convention_1787.xlsx
- Data/Delegates/State Delegates.xlsx
Pre1790 Debt Files
- Data/Pre1790/ contains the raw Pre-1790 debt certificates that were copied over dropbox
Post1790 Debt Files
- Data/Post1790/ contains the raw Post-1790 debt certificates that were copied over dropbox, organized by state in the folders

Cleaned Data

These are products ofclean_debt_files.ipynb

Delegates - located inData/Delegates/cleaned
- State_Delegates_cleaned.csv
- constitutional_convention_delegates_cleaned.csv
Pre-1790 Debt Files - located inData/Pre1790/cleaned
- Marine_Liquidated_Debt_Certificates_cleaned.csv
- Pierce_Certs_cleaned_2021.csv
- liquidated_debt_certificates_CT_cleaned.csv
- liquidated_debt_certificates_DE_cleaned.csv
- liquidated_debt_certificates_MA_cleaned.csv
- liquidated_debt_certificates_NH_cleaned.csv
- liquidated_debt_certificates_NJ_cleaned.csv
- liquidated_debt_certificates_NY_cleaned.csv
- liquidated_debt_certificates_PA_stelle_cleaned.csv
- liquidated_debt_certificates_PA_story_cleaned.csv
- liquidated_debt_certificates_RI_cleaned.csv
- Loan_office_certificates_9_states_cleaned.csv

Intermediary Data

final_matching.csv - Lists all the fuzzy matches matches generated by fuzzy_matching.ipynb, with the initial process.extract score in the Scores column
- Used in debt_aggregation.ipynbto generate totals of delegate assets
final_matching_post1790.csv - Lists all the fuzzy matches matches generated by fuzzy_matching_post1790.ipynb, with the initial process.extract score in the Scores column
- Used in debt_aggregation_post1790.ipynb to generate totals of delegate assets
state_companies.csv - Lists all the entries from Loan_office_certificates_9_states.xlsx which contained ‘&’ or ‘and’ with two last names
- Ex. Clark & Nightingale

Final Data

Delegates_Pre1790_Assets.csv - final compiled table of pre-1790 assets of delegates, formatted as requested
- Product ofdebt_aggregation.ipynb
Delegates_Post1790_Assets.csv- final compiled table of post-1790 assets of delegates, formatted requested
- Product ofdebt_aggregation_post1790.ipynb
Separate - aggregated debt for each type of asset
- ASD_debt_Matched.csv
- CD_debt_Matched.csv
- Loan_Office_Certificates_Matched.csv
- Marine_Certificates_Matched.csv
- Pierce_Certificates_Matched.csv
- State_Certificates_Matched.csv

Something I wish we did:

Identify company owners - see our incomplete list atstate_companies.csv
- This would give us a more complete way to calculate a person’s holdings
Change abbreviated names (Thm - Thomas, J/Jn to John/Jonathan)

Overview of pre-post debt matching

In this step, I matched individuals who owned continental debt post-1790 to their pre-1790 debt assets. post1790_data_prep.ipynb generates the dataset of matches and asset totals, and post1790_data_analysis.ipynb generates results relating to this question.

Name		Name	Last commit message	Last commit date
Latest commit History 711 Commits
.idea		.idea
.vscode		.vscode
archive		archive
cleaning_CD		cleaning_CD
cleaning_asd		cleaning_asd
data_clean		data_clean
data_raw		data_raw
debt analysis		debt analysis
speculation		speculation
table_team		table_team
web_app		web_app
.gitignore		.gitignore
Holdings of Notables.ipynb		Holdings of Notables.ipynb
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
README2.md		README2.md
Summer_Proposal__Economic_Origins_of_the_Constitution.pdf		Summer_Proposal__Economic_Origins_of_the_Constitution.pdf
doc.md		doc.md
requirements.txt		requirements.txt
web_app.md		web_app.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPEOC-pt-1

READ THIS BEFORE PROCEEDING

Overview of delegate matching

Detailed Description of Jupyter notebooks **

`clean_debt_files.ipynb`

fuzzy_matching.ipynb

fuzzy_matching_post1790.ipynb

debt_aggregation.ipynb

debt_aggregation_post1790.ipynb

Results.ipynb

Detailed description of csv files

Raw data

Cleaned Data

Intermediary Data

Final Data

Something I wish we did:

Overview of pre-post debt matching

About

Releases

Packages

Contributors 10

Languages

liaochris/SPEOC-pt-1

Folders and files

Latest commit

History

Repository files navigation

SPEOC-pt-1

READ THIS BEFORE PROCEEDING

Overview of delegate matching

Detailed Description of Jupyter notebooks **

clean_debt_files.ipynb

fuzzy_matching.ipynb

fuzzy_matching_post1790.ipynb

debt_aggregation.ipynb

debt_aggregation_post1790.ipynb

Results.ipynb

Detailed description of csv files

Raw data

Cleaned Data

Intermediary Data

Final Data

Something I wish we did:

Overview of pre-post debt matching

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

`clean_debt_files.ipynb`

Packages