Open Health Inspection Specification

From NYC OpenData Tech Standards
Jump to: navigation, search

Contents


NOTE: This specification is being drafted iteratively. Data consumers should be prepared to adapt to frequent format changes until the specification stabilizes.

This specification has been brought into alignment with the Open Health Inspection Standard (OHIS), as of November 20, 2012, which was authored by John Boiles with contributions from multiple U.S. cities.

Introduction

This is a preliminary/proposed specification for government (or government-delegated) entities that:

Coverage Considerations

Schema

Data must be provided through several CSV files, delivered within a single .zip archive with a flat hierarchy (no sub-directories/folders).

The following requirements apply to those files:

Files

All required CSV files and any available optional CSV files must be collected in a single ZIP file.

Feed Information (feed_info.csv)

The feed_info.csv file contains information about the feed itself. This file should only contain a single row. This file is optional.

Name Type Required Description
feed_date date Yes Date this feed was generated in YYYYMMDD format
feed_version string Yes Version of the OHIS specification used to generate this feed. For example, '0.4.1'
municipality_name string Yes Name of the municipality providing this feed. For example 'San Francisco' or 'Multnomah County'
municipality_url string No URL of the publishing municipality's website
contact_email string No Email address of the person to contact regarding invalid data in this feed


Businesses (businesses.csv)

The businesses.csv file contains information about businesses. This file is required.

Name Type Required Description
business_id string Yes Unique identifier for the business. For many cities, this may be the license number.
name string Yes Common name of the business
address string Yes Street address of the business. For example "706 Mission St"
city string No City of the business. This field must be included if the file contains businesses from multiple cities
state string No State or province for the business. In the U.S. this should be the two-letter code for the state
postal_code string No Postal code. For the US, standard 5-digit ZIP code or ZIP+4 code
latitude number No Latitude of the business. This field must be a valid WGS 84 latitude. For example "37.7859547"
longitude number No Longitude of the business. This field must be a valid WGS 84 longitude. For example "-122.4024658"
phone_number string No Phone number for a business including country specific dialing information. For example "+14159083801"


Inspections (inspections.csv)

The inspections.csv file contains information about inspectors’ visits to businesses. This file is required.

Name Type Required Description Description
business_id string Yes Unique identifier of the business for which this inspection was done
score number Yes Inspection score on a 0-100 scale. 100 is the highest score
date date Yes Date of the inspection in YYYYMMDD format
description string No Single line description containing details on the outcome of an inspection. Use of this field is only encouraged if no violations are provided.
type string No String representing the type of inspection. Must be one of the following values: initial, routine, followup)


Violations (violations.csv)

The violations.csv file contains information about specific violations. This file is optional.

Name Type Required Description
business_id string Yes Unique identifier of the business for which this violation applies
date date Yes Date of violation in YYYYMMDD format. This should correspond with the related inspection
code string No Code for the violation. It is recommended that this be based on the FDA Food Code. However, municipalities can decide to use pre-existing codes for this field
description string No One line description of the violation


Score Legend (legend.csv)

The legend.csv file contains a mappings from score ranges to human-readable descriptions of those scores. Municipalities can use this file to communicate the way scores are traditionally presented. For example, 0-60 may map to ‘Fail’ or 95-100 may map to ‘A+’. Note that maximum_score is non-inclusive. This means that for minimum_score = 70, and maximum_score = 80, any number less than 80 and also greater or equal to 70 would fall within the range. This file is optional.

Name Type Required Description
minimum_score number Yes Minimum score that can be classified with this description
maximum_score number Yes Maximum score that can be classified with this description. maximum_score is non-inclusive, meaning that only scores less than this maximum score will fall within this range.
description string Yes Formatted version of the score in the format typically presented by the municipality. For example 'A' or 'Pass'


Sample Data

OHIS-compatible sample datasets are available from the following jurisdictions:

Known Issues

Variations in Jurisdiction Scoring

While the US FDA provides guidelines for inspecting food service establishments, many of the major US municipalities employ different models for scoring them and making that information public.

For example:

These values are often derived from observed violations, of which each type may carry a different weight in terms of the overall score. While it is generally agreed that there should be a national approach to scoring, requiring it in the short term may be a barrier to overall adoption of the specification. The current version of the specification follows the San Francisco scoring model.

nopreview
nopreview
26 May 2012 16:43:20
Should we remove cuisine_type, even though it's optional? It will probably have different pre-defined values between jurisdictions; not sure if it would serve a huge value to the data consumers.
false
nopreview
26 May 2012 17:25:24
Anyone have any thoughts on how to merge this structure in? http://databases.sun-sentinel.com/Orlando/orlandoRestaurantInspections3/ftlaudRestaurantInspections_view.php?editid1=4298349
false
436f6d6d656e743a52657374617572616e7420496e7370656374696f6e204f70656e20446174612053706563696669636174696f6e2031333338303530363030353734
nopreview
27 May 2012 16:17:48
I think cuisine_type should be included, and perhaps, even make it a multi-value field (e.g. irish, microbrewery, chinese, fusion)
false
436f6d6d656e743a52657374617572616e7420496e7370656374696f6e204f70656e20446174612053706563696669636174696f6e2031333338313335343637393631
nopreview
29 May 2012 16:46:20
I'm all for optionally including it, but generally I think it should only be included if it's a factor which impacts the score in some way (for example, perhaps certain aspects of an inspection are not necessary in certain types of establishments).
false
436f6d6d656e743a52657374617572616e7420496e7370656374696f6e204f70656e20446174612053706563696669636174696f6e2031333338333039393739383835
nopreview
29 May 2012 19:12:44
Point taken. The bigger issue might be is that is there a "central" dataset about Restaurants where cuisine type info should live? It appears that the Inspection Results data is the "de facto" place to get this information. Also for the venue code, will that be a citywide-code that can be used in other datasets or will it just be the UID for this particular dataset?
false
436f6d6d656e743a52657374617572616e7420496e7370656374696f6e204f70656e20446174612053706563696669636174696f6e2031333338333138373333383135
nopreview
30 May 2012 04:57:31
Yes- it makes sense to have a separate dataset of licensed establishments. You're right that Restaurant Inspections are the de facto way to find that information.

For the venue code, in theory it would be a unique identifier across multiple datasets. I will add that into the description as a recommendation (but not necessarily a requirement).
false
436f6d6d656e743a52657374617572616e7420496e7370656374696f6e204f70656e20446174612053706563696669636174696f6e2031333338303533313234333333
nopreview
30 May 2012 16:32:25
That kind of detail is really very useful.  It may prove instructive to look at the native schema of the DOHMH database from which the current Restaurant Inspection Results extract is taken to see what other properties can be captured, now that you're looking into creating a Restaurant Inspection specification and considering a dataset for licensed establishments.
false
nopreview
Article quality rated:
nopreview
8 August 2012 18:45:59
Great start!  Very pragmatic and formulated for expedited adoption.  It would be nice though if the spec also calls for some file naming convention.  Perhaps, something similar to http://nycopendata.pediacities.com/wiki/index.php/City_Standards#File_Name_Conventions but with some tweaks.
false
nopreview
30 August 2012 14:27:45
Just wanted to say thanks for this; it's excellent! After getting a relatively large number of MalformedCSV (Ruby) exceptions thrown by the original NYC data file, I got _none_ processing the beta-OHIS restaurants.csv.

For context, the primary errors I receive in the current NYC file are mostly related to the "dba" (restaurant name; "doing business as") field, such as unescaped double quotes (real example:  ...",""2" BROTHER COFFEE SHOP","... )

Would love to see other jurisdictions adopting! I know Louisville, KY also makes their data available, per: http://codeforamerica.org/2012/07/10/from-zero-to-civic-in-5-minutes/
false
436f6d6d656e743a4f70656e204865616c746820496e7370656374696f6e2053706563696669636174696f6e2031333436333336383636353539
nopreview
31 August 2012 15:47:00
Glad you like it. The new files are actually generated from the existing .zip that NYC Health currently publishes. If you're interested in the full list of error corrections, check this out: https://github.com/technickle/NYC-OHIS-Generator/blob/master/restinsp.sh
false
Personal tools
Namespaces
Variants
Actions
Navigation
Using This Site
Main Content
Supplemental Content
Toolbox