天天看点

python lambda 判断空值_熊猫应用lambda函数空值

我试图将一列一分为二,但我知道我的数据中有空值。想象一下这个数据帧:df = pd.DataFrame(['fruit: apple','vegetable: asparagus',None, 'fruit: pear'], columns = ['text'])

df

text

0 fruit: apple

1 vegetable: asparagus

2 None

3 fruit: pear

我想把它分成多个列,如下所示:df['cat'] = df['text'].apply(lambda x: 'unknown' if x == None else x.split(': ')[0])

df['value'] = df['text'].apply(lambda x: 'unknown' if x == None else x.split(': ')[1])

print df

text cat value

0 fruit: apple fruit apple

1 vegetable: asparagus vegetable asparagus

2 None unknown unknown

3 fruit: pear fruit pear

但是,如果我有以下df:df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])

拆分会导致以下错误:df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

in ()

1 df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])

2 #df.columns = ['col_name']

----> 3 df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])

4 df['value'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[1])

C:\Python27\lib\site-packages\pandas\core\series.pyc in apply(self, func, convert_dtype, args, **kwds)

2158 values = lib.map_infer(values, lib.Timestamp)

2159

-> 2160 mapped = lib.map_infer(values, f, convert=convert_dtype)

2161 if len(mapped) and isinstance(mapped[0], Series):

2162 from pandas.core.frame import DataFrame

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62187)()

in (x)

1 df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])

2 #df.columns = ['col_name']

----> 3 df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])

4 df['value'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[1])

AttributeError: 'float' object has no attribute 'split'

如何使用NaN值进行相同的拆分?

有没有更好的方法来应用忽略空值的split函数?

假设这不是一个字符串示例,而是如果我有以下内容:df = pd.DataFrame([2,4,6,8,10,np.nan,12], columns = ['numerics'])

df['numerics'].apply(lambda x: np.nan if pd.isnull(x) else x/2.0)

我觉得Series.apply几乎应该接受一个参数,该参数指示它跳过空行并将它们作为空输出。我还没有找到一种更好的泛型方法来对序列进行转换,而不必手动避免空值。