Following is the district-level fiscal data from 2015-16.
LEAID | CENSUSID | FIPST | CONUM | CSA | CBSA | NAME | STNAME | STABBR | SCHLEV | ... | FL_61V | FL_66V | FL_W01 | FL_W31 | FL_W61 | FL_V95 | FL_V02 | FL_K14 | FL_CE1 | FL_CE2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100002 | N | 1 | 01073 | 142 | 13820 | Alabama Youth Services | Alabama | AL | N | ... | M | M | M | M | M | M | M | M | M | M |
1 | 100005 | 01504840100000 | 1 | 01095 | 290 | 10700 | Albertville City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
2 | 100006 | 01504800100000 | 1 | 01095 | 290 | 10700 | Marshall County | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
3 | 100007 | 01503740100000 | 1 | 01073 | 142 | 13820 | Hoover City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
4 | 100008 | 01504530100000 | 1 | 01089 | 290 | 26620 | Madison City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18703 | 5680250 | 51502040100000 | 56 | 56039 | N | 27220 | Region V BOCES | Wyoming | WY | 07 | ... | N | N | N | N | N | N | N | N | N | N |
18704 | 5680251 | N | 56 | 56021 | N | 16940 | Wyoming Department of Family Services | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18705 | 5680252 | N | 56 | 56005 | N | 23940 | Youth Emergency Services Inc. - Administration... | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18706 | 5680253 | N | 56 | 56025 | N | 16220 | Wyoming Behavioral Institute - Administration ... | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18707 | 5680254 | N | 56 | 56013 | N | 40180 | Saint Stephen's Indian School Admin Office | Wyoming | WY | 03 | ... | N | N | N | N | N | N | N | N | N | N |
18708 rows × 260 columns
It seems like the federal funding (revenue) follows a power log distribution, with a few schools recieving most of the federal fundings.
Revenue | |
---|---|
STNAME | |
California | 7709274795 |
Texas | 6194316950 |
New York | 3374793578 |
Florida | 3147328988 |
Illinois | 2334944853 |
Pennsylvania | 2037314939 |
Ohio | 1837962930 |
Georgia | 1815241986 |
Michigan | 1731033937 |
North Carolina | 1587975932 |
Arizona | 1302009922 |
New Jersey | 1249740990 |
Louisiana | 1115618894 |
Washington | 1098331976 |
Tennessee | 1096181996 |
Virginia | 1058145818 |
Indiana | 1015475961 |
Missouri | 959977972 |
Kentucky | 880295974 |
South Carolina | 860866990 |
Maryland | 823598998 |
Massachusetts | 804594987 |
Alabama | 803906916 |
Wisconsin | 782646932 |
Colorado | 721718867 |
Oklahoma | 703224894 |
Mississippi | 690723968 |
Minnesota | 685054951 |
Arkansas | 606945973 |
Oregon | 582559990 |
New Mexico | 516288985 |
Connecticut | 484185984 |
Iowa | 464851996 |
Kansas | 453921959 |
Utah | 419641976 |
Nevada | 405789000 |
West Virginia | 360283000 |
Nebraska | 346825973 |
Alaska | 307320000 |
Hawaii | 261131000 |
Idaho | 248545991 |
District of Columbia | 226201985 |
Montana | 220339869 |
South Dakota | 196643958 |
Rhode Island | 188203996 |
Maine | 186522987 |
New Hampshire | 169165780 |
North Dakota | 155452961 |
Delaware | 144706995 |
Wyoming | 123011974 |
Vermont | 111890912 |
<matplotlib.axes._subplots.AxesSubplot at 0x7f5915655588>
LEAID | CENSUSID | FIPST | CONUM | CSA | CBSA | NAME | STNAME | STABBR | SCHLEV | ... | FL_61V | FL_66V | FL_W01 | FL_W31 | FL_W61 | FL_V95 | FL_V02 | FL_K14 | FL_CE1 | FL_CE2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100002 | N | 1 | 01073 | 142 | 13820 | Alabama Youth Services | Alabama | AL | N | ... | M | M | M | M | M | M | M | M | M | M |
1 | 100005 | 01504840100000 | 1 | 01095 | 290 | 10700 | Albertville City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
2 | 100006 | 01504800100000 | 1 | 01095 | 290 | 10700 | Marshall County | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
3 | 100007 | 01503740100000 | 1 | 01073 | 142 | 13820 | Hoover City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
4 | 100008 | 01504530100000 | 1 | 01089 | 290 | 26620 | Madison City | Alabama | AL | 03 | ... | M | M | R | R | R | M | M | M | M | M |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18703 | 5680250 | 51502040100000 | 56 | 56039 | N | 27220 | Region V BOCES | Wyoming | WY | 07 | ... | N | N | N | N | N | N | N | N | N | N |
18704 | 5680251 | N | 56 | 56021 | N | 16940 | Wyoming Department of Family Services | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18705 | 5680252 | N | 56 | 56005 | N | 23940 | Youth Emergency Services Inc. - Administration... | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18706 | 5680253 | N | 56 | 56025 | N | 16220 | Wyoming Behavioral Institute - Administration ... | Wyoming | WY | N | ... | N | N | N | N | N | N | N | N | N | N |
18707 | 5680254 | N | 56 | 56013 | N | 40180 | Saint Stephen's Indian School Admin Office | Wyoming | WY | 03 | ... | N | N | N | N | N | N | N | N | N | N |
18708 rows × 260 columns
The following are the unique values from the table. Most non-numeric values are of the form x-y GEx or LEx.
In order to convert this to a numeric value the following steps were performed.
- Range values: Mean of the range was taken
- GEx: Mean of x and 100 was taken
- LEx: Mean of x and 0 was taken (Essentially x/2)
- Null/Missing: Replaced by a positive normal distribution with the mean and sigma of the original distribution.
Finally, KL Divergence of the Original distribution vs the imputed distribution was compared to KL Divergence of the Original distribution and random normal distribution.
array(['42', '39', '66', '72', '37', '43', '69', '21', '62', '47', '34',
'60', '23', '54', '48', '46', '50', '11', '19', '35', '51', '13',
'26', '32', '22', '27', '41', '49', '44', '31', '38', '67', '53',
'33', '20', '36', '14', '63', '40', '56', '45', '52', '10', '17',
'29', '28', '87', '59', '58', '78', '10-14', '30', '25-29',
'30-34', '60-64', '30-39', '61', '45-49', '68', '35-39', '11-19',
'LT50', 'PS', '65-69', '80-89', '85-89', '55-59', 'LE10', '55',
'2', '15-19', '40-44', '50-54', 'LE20', '16', 'LE5', '20-29',
'21-39', '70-74', '15', 'GE90', '86', '76', '40-49', '24', '75-79',
'74', '25', '50-59', '70-79', '6-9', '85', '7', '40-59', '60-79',
'80-84', '20-24', '71', '65', '8', '64', '82', 'GE50', '80', '70',
'77', '88', '83', '93', '81', '57', '75', '84', '92', '73', 'GE95',
'18', '12', '6', '9', '3', '90', 'LE1', '4', 'GE80', '5', '60-69',
'79', '89', '98', '91', '95', '90-94', '94', 'GE99', '96', '97'],
dtype=object)
def unblur(x):
if '-' in x:
return (int(x.split('-')[0]) + int(x.split('-')[1])) / 2
if 'GE' in x:
return ((int(x.replace('GE', '')) + 100)/2)
if 'LE' in x:
return ((int(x.replace('LE', '')) + 0)/2)
fiscal_math_df['ALL_MTH00PCTPROF_1516'].apply(unblur).unique()
array([ nan, 12. , 27. , 32. , 62. , 34.5, 47. , 37. , 15. , 67. , 84.5,
87. , 57. , 5. , 17. , 42. , 52. , 10. , 2.5, 24.5, 30. , 72. ,
95. , 44.5, 77. , 54.5, 74.5, 7.5, 49.5, 69.5, 82. , 22. , 75. ,
97.5, 64.5, 92. , 90. , 0.5])
fiscal_math_df['Unblurred_Scores'] = fiscal_math_df['ALL_MTH00PCTPROF_1516'].apply(unblur)
fiscal_math_df['Unblurred_Scores'].plot.hist(bins = 100)
<matplotlib.axes._subplots.AxesSubplot at 0x7f5927be9a58>
The distribution looks like a normal distribution that flattens out at 0. This makes sense since a lower score than 0 is not possible.
As such, we use a normal distribution to estimate the distribution of the original scores. We use KL Divergence to figure out goodness of fit
fiscal_math_df['Unblurred_Scores'] = fiscal_math_df['ALL_MTH00PCTPROF_1516'].apply(unblur)
unblurred_mean = fiscal_math_df['Unblurred_Scores'].mean()
unblurred_sigma = fiscal_math_df['Unblurred_Scores'].std()
fiscal_math_df['Unblurred_Scores'] = fiscal_math_df['Unblurred_Scores']\
.fillna(pd.Series([abs(np.random.normal(unblurred_mean,unblurred_sigma)) for i in range(len(fiscal_math_df))]))
fiscal_math_df['Unblurred_Scores'].plot.hist(bins = 100)
<matplotlib.axes._subplots.AxesSubplot at 0x7f5927b8e518>
def kl_divergence(p, q):
return np.sum(np.where(p != 0, p * np.log(p / q), 0))
kl_divergence(fiscal_math_df['Unblurred_Scores'].sample(4955),
list(fiscal_math_df[fiscal_math_df['ALL_MTH00PCTPROF_1516'].str.isnumeric()]\
['ALL_MTH00PCTPROF_1516'].astype(int)))
45565.525218373106
kl_divergence(np.random.uniform(0,100,4955),
list(fiscal_math_df[fiscal_math_df['ALL_MTH00PCTPROF_1516'].str.isnumeric()]\
['ALL_MTH00PCTPROF_1516'].astype(int)))
97121.56023652945
KL divergence as we can see is about twice as good as a random distribution when compared to the original unmodified distribution and should be a decent method of imputation.
fiscal_df['TFEDREV'].sum() * .15
8340410870.7
to_cut = fiscal_df['TFEDREV'].sum() * .15
tocut_df = fiscal_df[['LEAID','TOTALEXP','V33']]
tocut_df = tocut_df[(tocut_df['TOTALEXP'] > 0) & (tocut_df['V33'] > 0)]
tocut_df['percapita_spending'] = tocut_df['TOTALEXP']/tocut_df['V33']
for i, v in tocut_df.sort_values('percapita_spending', ascending = False).iterrows():
if to_cut - v['TOTALEXP'] * .20 > 0:
tocut_df.loc[i, 'to_cut'] = v['TOTALEXP'] * .20
to_cut -= (v['TOTALEXP'] * .20)
else:
tocut_df.loc[i, 'to_cut'] = 0
tocut_df['to_cut'].plot.hist(bins = 100)
<matplotlib.axes._subplots.AxesSubplot at 0x7f58fb3e4240>
In order to effectively cut the budget in a way that's least damaging to the schools alraedy struggling, the method I used is as follows:
- Take the sum total of the amount to be cut.
- Sort the districts by the funding recieved in decreasing order.
- Remove 20% from the top n, till the budget cut is satisfied.
This is a methiod that saves 15,000 districts from budget cuts while only marginally increasing the cuts for the best performing schools and is in my opinion the best way to cut the budget with minimal damage.