www.thesuniljain.com

We provide raw data & guard against bias PDF Print E-mail
Thursday, 10 February 2011 00:00
AddThis Social Bookmark Button

 

Sunil Jain, KG Narendranath & MK Venu

 

Stand up and be counted, is the gentle advice from Registrar General of Census, to people who complain about lack of development and inadequate resource allocation. With the mammoth exercise of Census 2011 on, C Chandramouli speaks to Sunil Jain, KG Narendranath & MK Venu in an interview about the objectivity of the process and the utility of the data to the government, policymakers and private sector. He also discusses the sensitive issue of caste census and how the enumeration process and the UID mechanism will work together

Census 2011 is under way and for the first time, a National Population Register is also being created. Tell us how the two exercises differ in terms of their conduct, content and purpose.

This is the fifteenth national population census since the first one done in 1872 albeit in a non-synchronous manner and the seventh since the Independence. Starting 1881, decennial census has been held unfailingly with the express purpose of collecting specified data – demographic, marital, cultural, literacy- , migration- and fertility-related and economic (which includes work status). While this exercise is being done under the Census Act 1948, the National Population Register (NPR) is prepared under the Citizenship Act, which saw major amendments in 2003. The census aims at an accurate de facto headcount of the all persons on Indian soil on the deemed date: i.e. 00.00 hours on March 1, 2011 as far as the current census is concerned. So, all people who are on our seas including those in the vessels berthed at our ports at the specified time and Indians in our missions abroad become part of our population.

The census data is published only in aggregates – at the levels of village, ward, town, tehsil, districts, state and nation. Individual information is kept anonymous and confidential. This is despite the fact that since the last (2001) census, 100% of the data collected is captured and processed. However, the NPR database and its sub-set, the National Register of Indian Citizens would be finalised only after displaying the list of local residents in the respective panchayat or municipal ward. The locals will have the right to register their claims and objections on the basis of which corrections in the list would be made.

So, though we piggyback on the census to build NPR, creation of the population register is an exercise under a separate Act. We have already collected 15 pieces of identity-related data of a billion-plus people living in 30 crore houses in the first phase of the census (April-September 2010). The information collected would help build the NPR.

How does the first phase of census differ from the second phase this month?

The first phase was for what is called house-listing and housing census. Each enumerator visited about 135 houses during the phase, gave them numbers, and collected data on the nature and composition of the house, ownership status, amenities available and assets possessed by the household. Certain identity-related information about the members of the household was also collected during the phase. Even if the enumerator found a house closed/unoccupied during the first visit, he/she made a few more attempts to verify the status. The second phase called population enumeration kicked off on February 9 and will end on the last day of the month. In this phase, 29 questions will be canvassed so as to collect specified demographic, economic and social data of the de facto occupants of the houses which were identified in the first phase.

How will you keep, process and disseminate the data being collected? Also, what is the policy as to the effective use of this vast repository of information?

For the government and its various instrumentalities, the census data is an aid to informed decision-making. Our data is observational in nature and is the product of house-to-house enumeration, which means its integrity is very high. The data is classified, analysed and disseminated according to pre-determined parameters which the registrar of census does not have or a bias for or against. With research and analytics coming up in a big way in every industry and being central to the making of business strategies, entities that look at data as a business model would indeed want to source census data and customise it to the needs of their clients. With our data being specific as to the level of a ward in a town, any advertiser would want to rely on it. For the private sector, obviously, the data is of immense value in the making of various projections.

People often complain and even cavil about lack of development, inadequate allocation of resources etc. If that is so, they must note that the census is a tool to get to the actual situation on the ground. People must unhesitatingly seize this opportunity as the census data is a vital input that goes into policy-making, resource allocation and designing of government programmes. Therefore, my advice to everyone is to not get left out from this exercise. If you don't cooperate with census, then there is no point in saying later that decisions have been taken in vacuum and have no empirical basis. It is only in urban areas that we are facing the problem of enumerators not being facilitated by the people.

When politicians started losing their nurtured constituencies after the last census data was released, there came a realisation that the census indeed has a purpose to serve. The 2001 census data brought out an alarming fall in sex ratio. That eventually led to policy interventions to ensure birth of and protect the girl child. Similarly, the total sanitation programme was an offshoot of the last census which showed how many houses are without toilets with a very dispersed location specificity. There would be more reasons for people to realise the utility of census once the 2011 census data is published. For instance, there is no data on internet penetration today and the census 2011 will produce it. In the past, the census could bring out stunning pieces of data. Last census, for example, revealed that there are more number of housing owning TV sets in the country than those with bathrooms.

We hope that NPR would facilitate better targetting of services under government schemes and programmes.

The question of privacy has often been raised in regard to the census process. What are the restrictions when it comes to disclosure and use of the data?

We don't collect data about individuals that is not otherwise in the public domain. The breach of privacy, if at all, is restricted to publication of the NPR database in the local area for authentication. As I said earlier, the more elaborate census data is provided only at the aggregate level – till the level of village or municipal ward – so as to not violate the privacy of individuals.

The courts have also turned down pleas for sharing of individual data collected during the census- taking. Although census data is now fully digitised, that is only from the lowest level of aggregation which is the ward in a town. One can get the data in the specified aggregates from the registrar general. The classification of the data with the aid and advice of the expert panels from the respective states could allow us to respond to specific queries like, say, how many Scheduled Caste persons with a PhD are not living in a pucca house? We have certain standard parameters for tabulation of the data.

What are the pieces of information that you will collect in the current census which you have not been garnering in the past?

As for gender, a third category of “others” (transgender) is included. As far as marital status is concerned, separate codes are assigned for “separated” and “divorced.” Questions on disability have been modified to obtain information more specific to the nature of the disability. As regards the work status, a new category for those worked for less than 3 months in the past year is introduced which would help gather more accurate information of the level of unemployment. Until the last census, one's employment status was defined as non-worker, marginal worker (one who worked for up to six months) and main worker (more than six months). Among the marginal workers, now there would be two categories – those who worked up to 3 months and those who remained employed for 3-6 months.

But, how do you define work?

Ours is a disinterested endeavour in this regard also. We treat any economic activity as work. We also undertake the National Industrial Classification exercise under which there are four main categories – cultivators, agriculture labourers, household industrial workers and other workers.

Do you do income profiling?

Not really, but there are proxy indicators of income that the census data would bring about. For instance, information would be gathered on household amenities and assets. Till the 2001 census, the enumeration of the population was done on a 100% basis but only a percentage of the data was processed (45% in 1991 census). The data could be fully captured in the 2001 census. So, when 2011 census database is created, we would be able to cross-tab for the first time, which would help identify and analyse the trends in income status.

Don't you need to reduce the time lag between the census-taking and the publication of the data?

There has already been an improvement in this regard. While data used to be released 8 to 9 years after the enumeration was done, it took only 4 years to bring out the 2001 census data. We hope 2011 data could be released within 2 years thanks to improved technology that would allow speedier processing. The Indian census is a mammoth exercise involving 2.5 million enumerators who need to cover the country's 640 districts, 8,000 towns and 6.4 lakh villages. About 5.4 million instruction manuals are created in 18 languages, besides another 340 million census schedules produced in 16 languages. The exercise is undertaken with a budget of Rs 2,200 crore, which means a per capita cost of Rs 18.

How do you plan to go about collecting the biometric data and facilitating the 16-digit UID number for Indians?

After having collected the necessary data in the ongoing second phase (population enumeration), we will scan 100% of the data and convert the scanned images into a database by typing it out on the screen. Starting April, we will do another round of data collection, this time, the photographs of all above the age of 15 and biometric data (fingerprints/ iris scan) will be gathered. And the database would be provided to the UID Authority which will do the de-duplication, give the UID numbers and then return the data to us. The UIDAI would also intimate the people of the number allotted. It is our job to generate the NPR cards where the information would be encrypted. The biometric data collection is estimated to cost some Rs 8,000 crore. The cost of cards is yet to worked out, but it is surmised that each card could cost Rs 50 or so.

Our relations with the UIDAI is as their biggest registrars. We do house enumeration, with authentication done through rigorous process of publication and resolution in the local bodies. UIDAI has adopted a multiple registrar model.

Do you take care to normalise the data?

We only bring out raw data and facilitate empirical decision-making. One could recall the row over definition of women's work. The Supreme Court observed that there was a gender bias against women in the Census of India definition of work. The court felt the need to evaluate the household work and accused us of being gender-insensitive. We had to respectfully submit to it that the registrar of census does not define work or economic activity. The national system of accounts determines that. What we do is simply collect information and say that as per your (the government's) definition, so many are workers, so many are non-workers and so forth. We don't come out with our own analysis or definitions. We are the largest producer of thematic maps in the country. We bring out language, housing, administrative and SC/ST atlases. These show, for instance, which language is spoken where, how many people in a state speak which language, what is the housing status of which scheduled caste, etc.

Manual cartographic methods were used for mapping till the 1981 census. In the last three censuses, GIS software was used to produce digital maps. When it comes to mapping, a new feature of the census 2011 is the creation of satellite imagery based digital maps at the street and building level in 33 cities that are central/state capitals.

We also bring out what is called Census India, a CD-based product,where you can take various data sets from and generate your own maps, develop corresponding graphics and do whatever analysis you might be interested in. As part of the current census exercise, we also plan to come out with information sufficient for comparative analysis like, say, the change in population density between 1901 to 2001. On the assets owned by households, we don't have information prior to 2001, but with the 2011 data, comparative studies would be possible in this area also.

What will be your approach to sensitivities involved in the enumeration of caste as part of the census 2011?

As a matter of policy, enumeration of caste (other than SC/ST) has not been done in the country after 1931. Now, that the Cabinet has discussed the sensitive issue in three meetings, and decided to go for a caste census, we are bound to do that. Information on caste will be collected not during the population enumeration exercise this month, but in a separate phase spanning June to September. The question to be posed is simply “what is your caste” and whatever answer returned would be registered. The enumerator has no business or expertise to classify a caste as OBC or forward. The classification would be done later by a group of experts in consultation with the respective state.

The issue is very complex given the hundreds of castes and sub-castes in the country and the fact that a caste which is “backward” in one state could be deemed “forward” in another. There is a central list of OBCs and also state-specific lists. While some states do not have a list of OBCs, some have a list of OBCs with the sub-set, Most Backward Classes. There are also open-ended categories in the lists such as orphans and destitute children.

I want to make it clear that OBC is not a concept that we have in mind while collecting the caste data. It is not our job to classify castes under various groups. We are doing SC/ST classification because we are constitutionally allowed to do that. The registrar of census doesn't do any analysis on this front, to avoid any bias.

Why can't the caste census and the biometric data collection also be done along with the population enumeration so that cost can be reduced?

First of all, the state of logistics doesn't prevent collection of all data in one phase. Also, there is a concern that caste census if held along with population enumeration, could adversely impact the accuracy of the headcount and undermine the integrity of the census data.

What definition of urbanisation do you go by?

Urbanisation in India is basically defined in two different ways. One definition is under the Nagar Palika Act, where a locality is deemed urban irrespective of its size (population) or any other development parameters, which is mostly a political decision . So, under this measure, there could be villages with population as high as 30,000 and all the characteristics of a town, but still reckoned as a village. This could be because the relevant body thought that if you make it a town, taxation etc could be impacted and it is therefore a political decision to call it a village.

The second set of towns are those which satisfy certain empirical criteria—minimum population of 5,000, density of population, certain section of male workers in non-agriculture and so on. These are scientific criteria. On the basis of census 2001, we have identified roughly 8,000 towns. Census 2011 is unlikely to result in any big change in this set as last time we had included all those towns which fell on the borderline. The size class of towns is decided by the urban development department.

Tell us about annual health surveys.

There is a civil registration system for births and deaths. We also do the sample registration system in which the rates of birth and death are captured. The annual health survey is done in 8 backward states and Orissa and Assam . Here, we not only collect the data relating to birth and death rates, disease pattern, morbidity, mortality etc, but also do anthropometric testing. The aim of the forthcoming survey is to produce data in aggregates down to the district level, whereas in the past, there was no disaggregation of the data beyond the state-level. Soon, we will be doing the first health survey in which information on blood pressure, blood grouping, sugar levels etc would be collected. This would help identify pre-disposing disease factors of the society with a high level of location specificity which would be useful in the devising of healthcare policies and deciding Budget priorities. By April, we will be able to give the first set of figures at the district level.

How do you ensure quality of the data?

The head of each household is given an opportunity to verify how the enumerator has recorded the answers returned by him to the census questionnaire. His signature/thumb impression is obtained. In the case of NPR, the authentication at the village/ward level is a check. Besides, a post-enumeration survey will be done shortly after the census. This is an independent sample survey that replicates the census to find out the level of omission and content errors. The omission rate in Indian census is in the range of 1.8-2.2%, while up to 2% omission is globally accepted. Given our huge population, this is still a big number.

In terms of comprehensiveness, how does the Indian census compare with the western countries and China?

The European countries and China use two sets of forms– one, a short form of 5-8 questions and another one which is longer but meant for a sample of the population. We have two long forms which are filled up for every household. Considering the size of the population, meticulous house-to-house enumeration, ours is truly a unique exercise.

 

You are here  : Home Q&A We provide raw data & guard against bias