Case and Geographic Identification

Back to Constructed Variables.

IHDS households are uniquely identified by the combination of stateid + distid + psuid + hhid + hhsplitid (where “+” signifies concatenation, not addition). In the individual file, persons are uniquely identified by the combination of those variables + personid. Several other identification variables are available to assist in sorting and merging files and for identifying geographic areas. See the merging page for help in merging files.

Variable Obs Unique Mean Min Max Label
stateid 41554 33 18.78 1 34 State code
distid 41554 61 14.69 0 68 District code
psuid 41554 39 5.76 1 39 PSU: village/neighborhood code
hhid 41554 52 9.22 1 52 Household ID
hhsplitid 41554 8 0.41 0 7 Split household ID
caseid 41554 41554 NA NA NA HH id: 11 byte string
idhh 41554 41554 181680422 10201010 340006150 HH id 9-digit unique
idpsu 41554 2474 189288 10201 340006 PSU id 6-digit unique
stateid2 41554 22 483.18 101 733 State codes, collapsed
distname 41554 373 1892.8 102 3400 District codes with names
dist01 41554 61 14.67 0 68 H1sp: District ID Census 2001
urban 41554 2 0.36 0 1 Census: 2001 village/town
metro6 4133 6 2.97 1 6 Largest 6 metro areas 1-6
sweight 41554 1526 4623.48 220 308216.4 Design weights

caseid: There is a string variable named caseid in both the hh and ind files, but they are different variables. In the hh file, caseid uniquely identifies each household (i.e., = stateid + distid + psuid + hhid + hhsplitid ) while in the ind file caseid uniquely identifies each person (i.e., = stateid +distid +psuid + hhid + hhsplitid + personid).

idhh: idhh is a long integer variable that uniquely identifies each household. idhh is calculated as stateid*10000000 + distid*100000 + psuid*1000 + hhid*10 + hhsplitid.

idpsu: idpsu is a long integer variable that uniquely identifies each primary sampling unit (PSU). It is useful for identifying survey clusters in some statistical analyses. idpsu is calculated as stateid*10000 + distid*100 + psuid.

stateid2: stateid2 is a slightly collapsed version of stateid that creates 22 states and state groups from the 33 states in IHDS. stateid2 also sorts the states into a slightly different regional order. Chandigarh is collapsed into Punjab. All Northeast states and Sikkim are treated as a single group. Daman and Diu is collapsed into Gujarat. Dadra and Nagar Haveli is also collapsed into Gujarat. Goa is collapsed into Maharashtra. Pondicherry is collapsed into Tamil Nadu.

dist01: District identifiers in IHDS (distid) are generally the census 2001 district identification number. In a small number of cases (317 households), distid does not record the correct census code. The correct 2001 census code is always given in dist01. However, dist01 should not be used for sorting or identifying households since PSU and household ids are not unique within dist01. stateid is always the 2001 census state code.

distname: distname is a 4 digit integer code combining the 2001 census state and district codes. Value labels provide the name of each district.

urban: This dichotomy identifies every primary sampling unit that was in an urban area as identified by the 2001 census. It differs slightly from the code recorded in the urban/rural variable, id9, on the household survey which was created from the sampling design. 19 rural PSUs in the sampling design were changed to urban areas in the 2001 census.

vsweight: The IHDS sample is a complex combination of rural and urban samples (see the section on samples). To calculate population estimates for India or for individual states, sweight is needed as a design weight in all analyses.

vmetro6: The six largest metropolitan areas (Mumbai, Delhi, Kolkata, Chennai, Bangalore, and Hyderabad) are identified as codes 1-6 in metro6. Households in all other areas are coded as missing. Metropolitan areas were defined as any district included in the census definition of “urban conglomerates” for each of these six areas. These districts often include both urban and rural areas; all parts are included in the metro6 categories. Delhi is a slight exception to this definition. The Census of India does not allow urban conglomerates to cross state boundaries, although parts of U.P. and Haryana are clearly part of the larger Delhi metropolitan area. To correct for this, Gurgaon district in Haryana and Ghaziabad and Gautam Buddha Nagar districts in U.P. are included as part of the Delhi metropolitan area.

Back to Constructed Variables