Open Health Inspection Specification

From NYC OpenData Tech Standards
Jump to: navigation, search


NOTE: This specification is being drafted iteratively. Data consumers should be prepared to adapt to frequent format changes until the specification stabilizes.

This specification has been brought into alignment with the Open Health Inspection Standard (OHIS), as of November 20, 2012, which was authored by John Boiles with contributions from multiple U.S. cities.


This is a preliminary/proposed specification for government (or government-delegated) entities that:

Coverage Considerations


Data must be provided through several CSV files, delivered within a single .zip archive with a flat hierarchy (no sub-directories/folders).

The following requirements apply to those files:


All required CSV files and any available optional CSV files must be collected in a single ZIP file.

Feed Information (feed_info.csv)

The feed_info.csv file contains information about the feed itself. This file should only contain a single row. This file is optional.

Name Type Required Description
feed_date date Yes Date this feed was generated in YYYYMMDD format
feed_version string Yes Version of the OHIS specification used to generate this feed. For example, '0.4.1'
municipality_name string Yes Name of the municipality providing this feed. For example 'San Francisco' or 'Multnomah County'
municipality_url string No URL of the publishing municipality's website
contact_email string No Email address of the person to contact regarding invalid data in this feed

Businesses (businesses.csv)

The businesses.csv file contains information about businesses. This file is required.

Name Type Required Description
business_id string Yes Unique identifier for the business. For many cities, this may be the license number.
name string Yes Common name of the business
address string Yes Street address of the business. For example "706 Mission St"
city string No City of the business. This field must be included if the file contains businesses from multiple cities
state string No State or province for the business. In the U.S. this should be the two-letter code for the state
postal_code string No Postal code. For the US, standard 5-digit ZIP code or ZIP+4 code
latitude number No Latitude of the business. This field must be a valid WGS 84 latitude. For example "37.7859547"
longitude number No Longitude of the business. This field must be a valid WGS 84 longitude. For example "-122.4024658"
phone_number string No Phone number for a business including country specific dialing information. For example "+14159083801"

Inspections (inspections.csv)

The inspections.csv file contains information about inspectors’ visits to businesses. This file is required.

Name Type Required Description Description
business_id string Yes Unique identifier of the business for which this inspection was done
score number Yes Inspection score on a 0-100 scale. 100 is the highest score
date date Yes Date of the inspection in YYYYMMDD format
description string No Single line description containing details on the outcome of an inspection. Use of this field is only encouraged if no violations are provided.
type string No String representing the type of inspection. Must be one of the following values: initial, routine, followup)

Violations (violations.csv)

The violations.csv file contains information about specific violations. This file is optional.

Name Type Required Description
business_id string Yes Unique identifier of the business for which this violation applies
date date Yes Date of violation in YYYYMMDD format. This should correspond with the related inspection
code string No Code for the violation. It is recommended that this be based on the FDA Food Code. However, municipalities can decide to use pre-existing codes for this field
description string No One line description of the violation

Score Legend (legend.csv)

The legend.csv file contains a mappings from score ranges to human-readable descriptions of those scores. Municipalities can use this file to communicate the way scores are traditionally presented. For example, 0-60 may map to ‘Fail’ or 95-100 may map to ‘A+’. Note that maximum_score is non-inclusive. This means that for minimum_score = 70, and maximum_score = 80, any number less than 80 and also greater or equal to 70 would fall within the range. This file is optional.

Name Type Required Description
minimum_score number Yes Minimum score that can be classified with this description
maximum_score number Yes Maximum score that can be classified with this description. maximum_score is non-inclusive, meaning that only scores less than this maximum score will fall within this range.
description string Yes Formatted version of the score in the format typically presented by the municipality. For example 'A' or 'Pass'

Sample Data

OHIS-compatible sample datasets are available from the following jurisdictions:

Known Issues

Variations in Jurisdiction Scoring

While the US FDA provides guidelines for inspecting food service establishments, many of the major US municipalities employ different models for scoring them and making that information public.

For example:

These values are often derived from observed violations, of which each type may carry a different weight in terms of the overall score. While it is generally agreed that there should be a national approach to scoring, requiring it in the short term may be a barrier to overall adoption of the specification. The current version of the specification follows the San Francisco scoring model.

26 May 2012 16:43:20
Should we remove cuisine_type, even though it's optional? It will probably have different pre-defined values between jurisdictions; not sure if it would serve a huge value to the data consumers.
26 May 2012 17:25:24
Anyone have any thoughts on how to merge this structure in?
27 May 2012 16:17:48
I think cuisine_type should be included, and perhaps, even make it a multi-value field (e.g. irish, microbrewery, chinese, fusion)
29 May 2012 16:46:20
I'm all for optionally including it, but generally I think it should only be included if it's a factor which impacts the score in some way (for example, perhaps certain aspects of an inspection are not necessary in certain types of establishments).
29 May 2012 19:12:44
Point taken. The bigger issue might be is that is there a "central" dataset about Restaurants where cuisine type info should live? It appears that the Inspection Results data is the "de facto" place to get this information. Also for the venue code, will that be a citywide-code that can be used in other datasets or will it just be the UID for this particular dataset?
30 May 2012 04:57:31
Yes- it makes sense to have a separate dataset of licensed establishments. You're right that Restaurant Inspections are the de facto way to find that information.

For the venue code, in theory it would be a unique identifier across multiple datasets. I will add that into the description as a recommendation (but not necessarily a requirement).
30 May 2012 16:32:25
That kind of detail is really very useful.  It may prove instructive to look at the native schema of the DOHMH database from which the current Restaurant Inspection Results extract is taken to see what other properties can be captured, now that you're looking into creating a Restaurant Inspection specification and considering a dataset for licensed establishments.
Article quality rated:
8 August 2012 18:45:59
Great start!  Very pragmatic and formulated for expedited adoption.  It would be nice though if the spec also calls for some file naming convention.  Perhaps, something similar to but with some tweaks.
30 August 2012 14:27:45
Just wanted to say thanks for this; it's excellent! After getting a relatively large number of MalformedCSV (Ruby) exceptions thrown by the original NYC data file, I got _none_ processing the beta-OHIS restaurants.csv.

For context, the primary errors I receive in the current NYC file are mostly related to the "dba" (restaurant name; "doing business as") field, such as unescaped double quotes (real example:  ...",""2" BROTHER COFFEE SHOP","... )

Would love to see other jurisdictions adopting! I know Louisville, KY also makes their data available, per:
31 August 2012 15:47:00
Glad you like it. The new files are actually generated from the existing .zip that NYC Health currently publishes. If you're interested in the full list of error corrections, check this out:
Personal tools
Using This Site
Main Content
Supplemental Content